Mark Gerstein
Department of Molecular Biophysics & Biochemistry
266 Whitney Avenue, Yale University
PO Box 208114, New Haven, CT 06520, USA
<Mark.Gerstein@yale.edu>
 
For Nature Structural Biology
 
1.      Sonnhammer, E.L., Eddy, S.R., Birney, E., 
  Bateman, A. & Durbin, R. Pfam: multiple sequence alignments and HMM-profiles 
  of protein domains. Nucleic Acids Res 26, 320-2 (1998).
  http://www.sanger.ac.uk/Pfam 
   
2.   Tatusov, R.L., Koonin, E.V. & Lipman, D.J. 
    A genomic perspective on protein families. Science 278, 631-7 (1997).
    http://www.ncbi.nlm.nih.gov/COG
  
3.   Yona, G., Linial, N. & Linial, M. ProtoMap: 
    automatic classification of protein sequences and hierarchy of protein families. 
    Nucleic Acids Res 28, 49-55 (2000).
    http://protomap.stanford.edu
    
4.   Holm, L. & Sander, C. The FSSP database 
    of structurally aligned protein fold families. Nuc. Acid Res. 22, 3600-3609 
    (1994).
    http://www.ebi.ac.uk/dali/fssp
    
5.   Murzin, A., Brenner, S.E., Hubbard, T. & 
    Chothia, C. SCOP: A Structural Classification of Proteins for the Investigation 
    of Sequences and Structures. J. Mol. Biol. 247, 536-540 (1995).
    http://scop.mrc-lmb.cam.ac.uk/scop
    
6.   Orengo, C.A., et al. CATH—a hierarchic classification 
    of protein domain structures. Structure 5, 1093-108 (1997).
    http://www.biochem.ucl.ac.uk/bsm/cath
    
7.   Brenner, S.E., Chothia, C. & Hubbard, T.J. 
    Population statistics of protein structures: lessons from structural classifications. 
    Curr Opin Struct Biol 7, 369-76 (1997).
    
8.   Wolf, Y.I., Grishin, N.V. & Koonin, E.V. 
    Estimating the Number of Protein Folds and Families from Complete Genome Data. 
    J. Mol. Biol. 299, 897-905 (2000).
    
9.   
    Altschul, S., Gish, W., Miller, W., Myers, E.W. & Lipman, D.J. 
    Basic local alignment search tool. J. Mol. Biol. 215, 403-410 (1990).
    http://www.ncbi.nlm.nih.gov/BLAST
    
10. Pearson, W.R. Empirical statistical estimates 
    for sequence similarity searches. J Mol Biol 276, 71-84 (1998).
    http://fasta.bioch.virginia.edu
    
11. Altschul, S.F., et al. Gapped BLAST and PSI-BLAST: 
    a new generation of protein database search programs. Nucleic Acids Res 25, 
    3389-402 (1997).
    http://www.ncbi.nlm.nih.gov/BLAST
    
12. Kelley, L.A., MacCallum, R.M. & Sternberg, 
    M.J. Enhanced genome annotation using structural profiles in the program 3D- 
    PSSM. J Mol Biol 299, 523-44 (2000).
    http://www.bmm.icnet.uk/~3dpssm
    
13. Fischer, D. & Eisenberg, D. Predicting structures 
    for genome proteins. Curr Opin Struct Biol 9, 208-11 (1999).
    http://www.doe-mbi.ucla.edu/people/frsvr/preds/MG/MG.html
    
14. Jones, D.T. GenTHREADER: an efficient and reliable 
    protein fold recognition method for genomic sequences. J. Mol. Biol. 287, 
    (1999).
    http://insulin.brunel.ac.uk/threader/threader.html
    
15. Gerstein, M. How Representative are the Known 
    Structures of the Proteins in a Complete Genome? A Comprehensive Structural 
    Census. Folding & Design 3, 497-512 (1998).
    http://bioinfo.mbb.yale.edu/genecensus
    
16. Gerstein, M. A Structural Census of Genomes: Comparing 
    Eukaryotic, Bacterial and Archaeal Genomes in terms of Protein Structure. 
    J. Mol. Biol. 274, 562-576 (1997).
    http://bioinfo.mbb.yale.edu/genome/browser
    
17. Wolf, Y.I., Brenner, S.E., Bash, P.A. & Koonin, 
    E.V. Distribution of protein folds in the three superkingdoms of life. Genome 
    Res 9, 17-26 (1999).
    ftp://ftp.ncbi.nlm.nih.gov/pub/koonin/FOLDS/index.html
    
18. Lin, J. & Gerstein, M. Whole-genome trees 
    based on the occurrence of folds and orthologs: implications for comparing 
    genomes on different levels. Genome Res 10, 808-18 (2000).
    http://bioinfo.mbb.yale.edu/genome/tree
    
19. Gerstein, M. Patterns of Protein-Fold Usage in 
    Eight Microbial Genomes: A Comprehensive Structural Census. Proteins 33, 518-534 
    (1998).
    http://bioinfo.mbb.yale.edu/partslist
    
20. Li, H., Dunn, J.J., Luft, B.J. & Lawson, C.L. 
    Crystal structure of Lyme disease antigen outer surface protein A complexed 
    with an Fab. Proc Natl Acad Sci U S A 94, 3584-9 (1997).
    
21. Thornton, J.M., Orengo, C.A., Todd, A.E. & 
    Pearl, F.M. Protein folds, functions and evolution. J Mol Biol 293, 333-42 
    (1999).
    http://www.biochem.ucl.ac.uk/bsm/cathwheels
    
22. Karp, P.D., et al. The EcoCyc and MetaCyc databases. Nucleic Acids Res 28, 56-9 (2000).
http://ecocyc.DoubleTwist.com/ecocyc
    
23. Mewes, H.W., et al. MIPS: a database for genomes 
    and protein sequences. Nucleic Acids Res 27, 44-8 (1999).
    http://www.mips.biochem.mpg.de
    
24. Ashburner, M., et al. Gene ontology: tool for 
    the unification of biology. The Gene Ontology Consortium. Nat Genet 25, 25-9 
    (2000).
    http://geneontology.org
    
25. Hegyi, H. & Gerstein, M. The relationship 
    between protein structure and function: a comprehensive survey with application 
    to the yeast genome. J Mol Biol 288, 147-64 (1999).
    
http://bioinfo.mbb.yale.edu/genome/foldfunc
    
26. Martin, A.C., et al. Protein folds and functions. Structure 6, 875-84 (1998).
http://www.biochem.ucl.ac.uk/bsm/cathwheels
    
27. Chothia, C. & Lesk, A.M. The relation between 
    the divergence of sequence and structure in proteins. EMBO J. 5, 823-826 (1986).
    
28. 
    Wilson, C.A., Kreychman, J. & Gerstein, M. Assessing Annotation 
    Transfer for Genomics: Quantifying the Relations between Protein Sequence, 
    Structure and Function through Traditional and Probabilistic Scores. J Mol 
    Biol 297, 233-249 (2000).
    http://bioinfo.mbb.yale.edu/partslist/scop 
    
    
29. 
    Uetz, P., et al. A comprehensive analysis of protein-protein interactions 
    in Saccharomyces cerevisiae. Nature 403, 623-7 (2000).
    http://depts.washington.edu/sfields/projects/YPLM
    
30. Lipshutz, R.J., Fodor, S.P., Gingeras, T.R. & 
    Lockhart, D.J. High density synthetic oligonucleotide arrays. Nat Genet 21, 
    20-4 (1999).
    
31. 
    Brown, P.O. & Botstein, D. Exploring the new world of the genome 
    with DNA microarrays. Nat Genet 21, 33-7 (1999).
    http://genome-www4.stanford.edu/MicroArray/SMD
    
32. Gygi, S.P., Rochon, Y., Franza, B.R. & Aebersold, 
    R. Correlation between protein and mRNA abundance in yeast. Mol Cell Biol 
    19, 1720-30 (1999).
    
33. 
    Ross-Macdonald, P., et al. Large-scale analysis of the yeast genome 
    by transposon tagging and gene disruption. Nature 402, 413-8 (1999).
    http://www.yale.edu/snyder 
    
34. Jansen, R. & Gerstein, M. Analysis of the 
    Yeast Transcriptome with Broad Structural and Functional Categories: Characterizing 
    Highly Expressed Proteins. Nuc. Acids Res. 28, 1481-1488 (2000).
    http://bioinfo.mbb.yale.edu/genome/expression
    
35. Holstege, F.C., et al. Dissecting the regulatory 
    circuitry of a eukaryotic genome. Cell 95, 717-28 (1998).
    http://web.wi.mit.edu/young/pub/regulation.html
    
36. Frishman, D., Heumann, K., Lesk, A. & Mewes, 
    H.W. Comprehensive, comprehensible, distributed and intelligent databases: 
    current status. Bioinformatics 14, 551-61 (1998).
    
37. Etzold, T., Ulyanov, A. & Argos, P. SRS: information 
    retrieval system for molecular biology data banks. Methods Enzymol 266, 114-28 
    (1996).
    http://srs6.ebi.ac.uk
    
38.      Westbrook, J.D. & Bourne, P.E. STAR/mmCIF: 
    an ontology for macromolecular structure. Bioinformatics 16, 159-68 (2000).
    http://ndbserver.rutgers.edu/mmcif
    
39. Gerstein, M. E-publishing on the Web: promises, 
    pitfalls, and payoffs for bioinformatics. Bioinformatics 15, 429-31 (1999).