Mark Gerstein
Department of Molecular Biophysics & Biochemistry
266 Whitney Avenue, Yale University
PO Box 208114, New Haven, CT 06520, USA
<Mark.Gerstein@yale.edu>
For Nature Structural Biology
1. Sonnhammer, E.L., Eddy, S.R., Birney, E.,
Bateman, A. & Durbin, R. Pfam: multiple sequence alignments and HMM-profiles
of protein domains. Nucleic Acids Res 26, 320-2 (1998).
http://www.sanger.ac.uk/Pfam
2. Tatusov, R.L., Koonin, E.V. & Lipman, D.J.
A genomic perspective on protein families. Science 278, 631-7 (1997).
http://www.ncbi.nlm.nih.gov/COG
3. Yona, G., Linial, N. & Linial, M. ProtoMap:
automatic classification of protein sequences and hierarchy of protein families.
Nucleic Acids Res 28, 49-55 (2000).
http://protomap.stanford.edu
4. Holm, L. & Sander, C. The FSSP database
of structurally aligned protein fold families. Nuc. Acid Res. 22, 3600-3609
(1994).
http://www.ebi.ac.uk/dali/fssp
5. Murzin, A., Brenner, S.E., Hubbard, T. &
Chothia, C. SCOP: A Structural Classification of Proteins for the Investigation
of Sequences and Structures. J. Mol. Biol. 247, 536-540 (1995).
http://scop.mrc-lmb.cam.ac.uk/scop
6. Orengo, C.A., et al. CATH—a hierarchic classification
of protein domain structures. Structure 5, 1093-108 (1997).
http://www.biochem.ucl.ac.uk/bsm/cath
7. Brenner, S.E., Chothia, C. & Hubbard, T.J.
Population statistics of protein structures: lessons from structural classifications.
Curr Opin Struct Biol 7, 369-76 (1997).
8. Wolf, Y.I., Grishin, N.V. & Koonin, E.V.
Estimating the Number of Protein Folds and Families from Complete Genome Data.
J. Mol. Biol. 299, 897-905 (2000).
9.
Altschul, S., Gish, W., Miller, W., Myers, E.W. & Lipman, D.J.
Basic local alignment search tool. J. Mol. Biol. 215, 403-410 (1990).
http://www.ncbi.nlm.nih.gov/BLAST
10. Pearson, W.R. Empirical statistical estimates
for sequence similarity searches. J Mol Biol 276, 71-84 (1998).
http://fasta.bioch.virginia.edu
11. Altschul, S.F., et al. Gapped BLAST and PSI-BLAST:
a new generation of protein database search programs. Nucleic Acids Res 25,
3389-402 (1997).
http://www.ncbi.nlm.nih.gov/BLAST
12. Kelley, L.A., MacCallum, R.M. & Sternberg,
M.J. Enhanced genome annotation using structural profiles in the program 3D-
PSSM. J Mol Biol 299, 523-44 (2000).
http://www.bmm.icnet.uk/~3dpssm
13. Fischer, D. & Eisenberg, D. Predicting structures
for genome proteins. Curr Opin Struct Biol 9, 208-11 (1999).
http://www.doe-mbi.ucla.edu/people/frsvr/preds/MG/MG.html
14. Jones, D.T. GenTHREADER: an efficient and reliable
protein fold recognition method for genomic sequences. J. Mol. Biol. 287,
(1999).
http://insulin.brunel.ac.uk/threader/threader.html
15. Gerstein, M. How Representative are the Known
Structures of the Proteins in a Complete Genome? A Comprehensive Structural
Census. Folding & Design 3, 497-512 (1998).
http://bioinfo.mbb.yale.edu/genecensus
16. Gerstein, M. A Structural Census of Genomes: Comparing
Eukaryotic, Bacterial and Archaeal Genomes in terms of Protein Structure.
J. Mol. Biol. 274, 562-576 (1997).
http://bioinfo.mbb.yale.edu/genome/browser
17. Wolf, Y.I., Brenner, S.E., Bash, P.A. & Koonin,
E.V. Distribution of protein folds in the three superkingdoms of life. Genome
Res 9, 17-26 (1999).
ftp://ftp.ncbi.nlm.nih.gov/pub/koonin/FOLDS/index.html
18. Lin, J. & Gerstein, M. Whole-genome trees
based on the occurrence of folds and orthologs: implications for comparing
genomes on different levels. Genome Res 10, 808-18 (2000).
http://bioinfo.mbb.yale.edu/genome/tree
19. Gerstein, M. Patterns of Protein-Fold Usage in
Eight Microbial Genomes: A Comprehensive Structural Census. Proteins 33, 518-534
(1998).
http://bioinfo.mbb.yale.edu/partslist
20. Li, H., Dunn, J.J., Luft, B.J. & Lawson, C.L.
Crystal structure of Lyme disease antigen outer surface protein A complexed
with an Fab. Proc Natl Acad Sci U S A 94, 3584-9 (1997).
21. Thornton, J.M., Orengo, C.A., Todd, A.E. &
Pearl, F.M. Protein folds, functions and evolution. J Mol Biol 293, 333-42
(1999).
http://www.biochem.ucl.ac.uk/bsm/cathwheels
22. Karp, P.D., et al. The EcoCyc and MetaCyc databases. Nucleic Acids Res 28, 56-9 (2000).
http://ecocyc.DoubleTwist.com/ecocyc
23. Mewes, H.W., et al. MIPS: a database for genomes
and protein sequences. Nucleic Acids Res 27, 44-8 (1999).
http://www.mips.biochem.mpg.de
24. Ashburner, M., et al. Gene ontology: tool for
the unification of biology. The Gene Ontology Consortium. Nat Genet 25, 25-9
(2000).
http://geneontology.org
25. Hegyi, H. & Gerstein, M. The relationship
between protein structure and function: a comprehensive survey with application
to the yeast genome. J Mol Biol 288, 147-64 (1999).
http://bioinfo.mbb.yale.edu/genome/foldfunc
26. Martin, A.C., et al. Protein folds and functions. Structure 6, 875-84 (1998).
http://www.biochem.ucl.ac.uk/bsm/cathwheels
27. Chothia, C. & Lesk, A.M. The relation between
the divergence of sequence and structure in proteins. EMBO J. 5, 823-826 (1986).
28.
Wilson, C.A., Kreychman, J. & Gerstein, M. Assessing Annotation
Transfer for Genomics: Quantifying the Relations between Protein Sequence,
Structure and Function through Traditional and Probabilistic Scores. J Mol
Biol 297, 233-249 (2000).
http://bioinfo.mbb.yale.edu/partslist/scop
29.
Uetz, P., et al. A comprehensive analysis of protein-protein interactions
in Saccharomyces cerevisiae. Nature 403, 623-7 (2000).
http://depts.washington.edu/sfields/projects/YPLM
30. Lipshutz, R.J., Fodor, S.P., Gingeras, T.R. &
Lockhart, D.J. High density synthetic oligonucleotide arrays. Nat Genet 21,
20-4 (1999).
31.
Brown, P.O. & Botstein, D. Exploring the new world of the genome
with DNA microarrays. Nat Genet 21, 33-7 (1999).
http://genome-www4.stanford.edu/MicroArray/SMD
32. Gygi, S.P., Rochon, Y., Franza, B.R. & Aebersold,
R. Correlation between protein and mRNA abundance in yeast. Mol Cell Biol
19, 1720-30 (1999).
33.
Ross-Macdonald, P., et al. Large-scale analysis of the yeast genome
by transposon tagging and gene disruption. Nature 402, 413-8 (1999).
http://www.yale.edu/snyder
34. Jansen, R. & Gerstein, M. Analysis of the
Yeast Transcriptome with Broad Structural and Functional Categories: Characterizing
Highly Expressed Proteins. Nuc. Acids Res. 28, 1481-1488 (2000).
http://bioinfo.mbb.yale.edu/genome/expression
35. Holstege, F.C., et al. Dissecting the regulatory
circuitry of a eukaryotic genome. Cell 95, 717-28 (1998).
http://web.wi.mit.edu/young/pub/regulation.html
36. Frishman, D., Heumann, K., Lesk, A. & Mewes,
H.W. Comprehensive, comprehensible, distributed and intelligent databases:
current status. Bioinformatics 14, 551-61 (1998).
37. Etzold, T., Ulyanov, A. & Argos, P. SRS: information
retrieval system for molecular biology data banks. Methods Enzymol 266, 114-28
(1996).
http://srs6.ebi.ac.uk
38. Westbrook, J.D. & Bourne, P.E. STAR/mmCIF:
an ontology for macromolecular structure. Bioinformatics 16, 159-68 (2000).
http://ndbserver.rutgers.edu/mmcif
39. Gerstein, M. E-publishing on the Web: promises,
pitfalls, and payoffs for bioinformatics. Bioinformatics 15, 429-31 (1999).