YS9900S Young Investigator (PI Gerstein)
7/1/99 - 6/30/04
W.M. Keck Foundation
Comparing Genomes in terms of Protein Structure
Role: PI

The major goals of this young investigator grant are to build a library of protein folds organizing the known protein structures and then to compare genomes, particularly those of pathogenic organisms, in terms of their usage of folds from this list. However, the flexible nature of the funding allows working on a variety of related projects.

This grant has greatly benefited the Gerstein laboratory by allowing it to expand into new areas and participate in large-scale studies in the capital-intensive field of genomics.

Related material:
Proposal [ doc ] [ pdf ]
Award Annoucement in Science [ gif1 ] [ gif2 ]
Year 1 report [ doc ][ html ] (figures [ doc ][ pdf ])
Year 2 report [ doc ][ html ]
Year 3 report [ doc ][ html ]
Year 3 short overview talk [ ppt ]
Year 3 long summary talk [ ppt ]
Year 4 report [ doc ][ html ]
Year 5 report [ doc ][ html ]
Year 6 report [ doc ][ html ]
URL: http://www.wmkeck.org
Articles funded by this grant:
Defining the TRiC/CCT interactome links chaperonin function to stabilization of newly made proteins with complex topologies.
AY Yam, Y Xia, HT Lin, A Burlingame, M Gerstein, J Frydman (2008). Nat Struct Mol Biol 15: 1255-62.

Genomic anonymity: have we already lost it?
D Greenbaum, J Du, M Gerstein (2008). Am J Bioeth 8: 71-4.

Seeking a new biology through text mining.
A Rzhetsky, M Seringhaus, M Gerstein (2008). Cell 134: 9-13.

Open access: taking full advantage of the content.
PE Bourne, JL Fink, M Gerstein (2008). PLoS Comput Biol 4: e1000037.

Manually structured digital abstracts: a scaffold for automatic text mining.
M Seringhaus, M Gerstein (2008). FEBS Lett 582: 1170.

Uncovering trends in gene naming.
MR Seringhaus, PD Cayting, MB Gerstein (2008). Genome Biol 9: 401.

Semantic Web Approach to Database Integration in the Life Sciences
KH Cheung, AK Smith, KYL Yip, CJO Baker, MB Gerstein (2007). in Semantic Web: Revolutionizing Knowledge Discovery in the Life Sciences (eds. C Baker and K Cheung, Springer, NY), pp. 11-30

Semantic Web Standards: Legal and Social Issues and Implications
D Greenbaum, M Gerstein (2007). in Semantic Web: Revolutionizing Knowledge Discovery in the Life Sciences (eds. C Baker and K Cheung, Springer, NY), pp. 413-433

Diverse cellular functions of the Hsp90 molecular chaperone uncovered using systems approaches.
AJ McClellan, Y Xia, AM Deutschbauer, RW Davis, M Gerstein, J Frydman (2007). Cell 131: 121-35.

Hinge Atlas: relating protein sequence to sites of structural flexibility.
SC Flores, LJ Lu, J Yang, N Carriero, MB Gerstein (2007). BMC Bioinformatics 8: 167.

Publishing perishing? Towards tomorrow's information architecture.
MR Seringhaus, MB Gerstein (2007). BMC Bioinformatics 8: 17.

Chemistry Nobel rich in structure.
M Seringhaus, M Gerstein (2007). Science 315: 40-1.

An interdepartmental Ph.D. program in computational biology and bioinformatics: the Yale perspective.
M Gerstein, D Greenbaum, K Cheung, PL Miller (2007). J Biomed Inform 40: 73-9.

The Death of the Scientific Paper
Seringhaus M, Gerstein M (2006). The Scientist. 20(9): 25

The geometry of the ribosomal polypeptide exit tunnel.
NR Voss, M Gerstein, TA Steitz, PB Moore (2006). J Mol Biol 360: 893-906.

The Database of Macromolecular Motions: new features added at the decade mark.
S Flores, N Echols, D Milburn, B Hespenheide, K Keating, J Lu, S Wells, EZ Yu, M Thorpe, M Gerstein (2006). Nucleic Acids Res 34: D296-301.

Multi-species microarrays reveal the effect of sequence divergence on gene expression profiles.
Y Gilad, SA Rifkin, P Bertone, M Gerstein, KP White (2005). Genome Res 15: 674-80.

Tools and databases to analyze protein flexibility; approaches to mapping implied features onto sequences.
WG Krebs, J Tsai, V Alexandrov, J Junker, R Jansen, M Gerstein (2003). Methods Enzymol 374: 544-84.

Identification and characterization of over 100 mitochondrial ribosomal protein pseudogenes in the human genome.
Z Zhang, M Gerstein (2003). Genomics 81: 468-80.

Genomics. Defining genes in the genomics era.
M Snyder, M Gerstein (2003). Science 300: 258-60.

Identification of pseudogenes in the Drosophila melanogaster genome.
PM Harrison, D Milburn, Z Zhang, P Bertone, M Gerstein (2003). Nucleic Acids Res 31: 1033-7.

Analysis of mRNA expression and protein abundance data: an approach for the comparison of the enrichment of features in the cellular population of proteins and transcripts.
D Greenbaum, R Jansen, M Gerstein (2002). Bioinformatics 18: 585-96.

Relating whole-genome expression data with protein-protein interactions.
R Jansen, D Greenbaum, M Gerstein (2002). Genome Res 12: 37-46.

Digging deep for ancient relics: a survey of protein motifs in the intergenic sequences of four eukaryotic genomes.
ZL Zhang, PM Harrison, M Gerstein (2002). J Mol Biol 323: 811-22.

Genomic analysis of membrane protein families: abundance and conserved motifs.
Y Liu, DM Engelman, M Gerstein (2002). Genome Biol 3: research0054.

Normal mode analysis of macromolecular motions in a database framework: developing mode concentration as a useful classifying statistic.
WG Krebs, V Alexandrov, CA Wilson, N Echols, H Yu, M Gerstein (2002). Proteins 48: 682-95.

The dominance of the population by a selected few: power-law behaviour applies to a wide variety of genomic properties.
NM Luscombe, J Qian, Z Zhang, T Johnson, M Gerstein (2002). Genome Biol 3: RESEARCH0040.

Functional profiling of the Saccharomyces cerevisiae genome.
G Giaever, AM Chu, L Ni, C Connelly, L Riles, S Veronneau, S Dow, A Lucau-Danila, K Anderson, B Andre, AP Arkin, A Astromoff, M El-Bakkoury, R Bangham, R Benito, S Brachat, S Campanaro, M Curtiss, K Davis, A Deutschbauer, KD Entian, P Flaherty, F Foury, DJ Garfinkel, M Gerstein, D Gotte, U Guldener, JH Hegemann, S Hempel, Z Herman, DF Jaramillo, DE Kelly, SL Kelly, P Kotter, D LaBonte, DC Lamb, N Lan, H Liang, H Liao, L Liu, C Luo, M Lussier, R Mao, P Menard, SL Ooi, JL Revuelta, CJ Roberts, M Rose, P Ross-Macdonald, B Scherens, G Schimmack, B Shafer, DD Shoemaker, S Sookhai-Mahadeo, RK Storms, JN Strathern, G Valle, M Voet, G Volckaert, CY Wang, TR Ward, J Wilhelmy, EA Winzeler, Y Yang, G Yen, E Youngman, K Yu, H Bussey, JD Boeke, M Snyder, P Philippsen, RW Davis, M Johnston (2002). Nature 418: 387-91.

SNPs on human chromosomes 21 and 22 -- analysis in terms of protein features and pseudogenes.
S Balasubramanian, P Harrison, H Hegyi, P Bertone, N Luscombe, N Echols, P McGarvey, Z Zhang, M Gerstein (2002). Pharmacogenomics 3: 393-402.

Comprehensive analysis of amino acid and nucleotide composition in eukaryotic genomes, comparing genes and pseudogenes.
N Echols, P Harrison, S Balasubramanian, NM Luscombe, P Bertone, Z Zhang, M Gerstein (2002). Nucleic Acids Res 30: 2515-23.

GATA-1 binding sites mapped in the beta-globin locus by using mammalian chIp-chip analysis.
CE Horak, MC Mahajan, NM Luscombe, M Gerstein, SM Weissman, M Snyder (2002). Proc Natl Acad Sci U S A 99: 2924-9.

An integrated approach for finding overlooked genes in yeast.
A Kumar, PM Harrison, KH Cheung, N Lan, N Echols, P Bertone, P Miller, MB Gerstein, M Snyder (2002). Nat Biotechnol 20: 58-63.

Protein family and fold occurrence in genomes: power-law behaviour and evolutionary model.
J Qian, NM Luscombe, M Gerstein (2001). J Mol Biol 313: 673-81.

Beyond synexpression relationships: local clustering of time-shifted and inverted gene expression profiles identifies new, biologically relevant interactions.
J Qian, M Dolled-Filhart, J Lin, H Yu, M Gerstein (2001). J Mol Biol 314: 1053-66.

Digging for dead genes: an analysis of the characteristics of the pseudogene population in the Caenorhabditis elegans genome.
PM Harrison, N Echols, MB Gerstein (2001). Nucleic Acids Res 29: 818-30.

PartsList: a web-based system for dynamically ranking protein folds based on disparate attributes, including whole-genome expression and interaction information.
J Qian, B Stenger, CA Wilson, J Lin, R Jansen, SA Teichmann, J Park, WG Krebs, H Yu, V Alexandrov, N Echols, M Gerstein (2001). Nucleic Acids Res 29: 1750-64.

Integrative database analysis in structural genomics.
M Gerstein (2000). Nat Struct Biol 7 Suppl: 960-3.

Genome-wide analysis relating expression level with protein subcellular localization.
A Drawid, R Jansen, M Gerstein (2000). Trends Genet 16: 426-30.

The current excitement in bioinformatics-analysis of whole-genome expression data: how does it relate to protein structure and function?
M Gerstein, R Jansen (2000). Curr Opin Struct Biol 10: 574-84.

The stability of thermophilic proteins: a study based on comprehensive genome comparison.
R Das, M Gerstein (2000). Funct Integr Genomics 1: 76-88.

Measuring shifts in function and evolutionary opportunity using variability profiles: a case study of the globins.
GJ Naylor, M Gerstein (2000). J Mol Evol 51: 223-33.

A Bayesian system integrating expression data with sequence patterns for localizing proteins: comprehensive application to the yeast genome.
A Drawid, M Gerstein (2000). J Mol Biol 301: 1059-75.

Proteomics of Mycoplasma genitalium: identification and characterization of unannotated and atypical proteins in a small model genome.
S Balasubramanian, T Schneider, M Gerstein, L Regan (2000). Nucleic Acids Res 28: 3075-82.

Protein folds in the worm genome.
M Gerstein, J Lin, H Hegyi (2000). Pac Symp Biocomput : 30-41.

The morph server: a standardized system for analyzing and visualizing macromolecular motions in a database framework.
WG Krebs, M Gerstein (2000). Nucleic Acids Res 28: 1665-75.


Return to front page