During the past year, we focused our research on three main areas, which are listed below.

1) Annotation of the Intergenic Regions of the Human Genome, focusing on Tiling Microarrays and Pseudogenes

2) Large-scale Prediction of Protein Function, in terms of Molecular Networks

3) Aspects of Structural Genomics:
Target Data Mining, Membrane Protein Interactions, and Analysis of Protein Motions in terms of Packing

In the first area, we had a number of major accomplishments. We identified all the pseudogenes in the mouse and human genomes and compared them. We were also part of a large-scale collaborative effort endeavoring to tile the entire human genome onto microarrays and look for all the transcribed regions. In the second area, we created an analysis of the yeast genome that coupled the dynamics of gene expression to regulatory networks, allowing us to discover transient hubs and the way regulatory network is "rewired" in different cellular states. We also developed rigorous method for mapping protein-protein and regulatory interactions between organisms and a practical web tool, called TopNet, for performing many molecular network calculations. In area three, we developed a methodology for describing structural flexibility in terms of Hidden Markov Models, a sophisticated mathematical formalism. We also developed a way of mining the structural genomics targets to identify those that are most suitable for high throughput structure determination. Finally, we embarked on new research directions, identifying particular aspects of database interoperability related to computer security.

For the following year we have a number of goals. We would like to inter-relate our tiling arrays with our pseudogenes to search for the elusive partially functional pseudogene -- the transcribed pseudogene. We would like to generalize our tool for molecular networks to encompass the prediction, as well as analysis of networks. We would like to extend our molecular motions database to better take into account the way packing constrains motions.

Excel Publication Matrix

Publication Grid

An XML-Based Approach to Integrating Heterogeneous Yeast Genome Data
KH Cheung, D Pan, A Smith, M Seringhaus, SM Douglas, M Gerstein (2004). International Conference on Mathematics and Engineering Techniques in Medicine and Biological Sciences (METMBS); pp 236-242

Computational analysis of membrane proteins: genomic occurrence, structure prediction and helix interactions.
U Lehnert, Y Xia, TE Royce, CS Goh, Y Liu, A Senes, H Yu, ZL Zhang, DM Engelman, M Gerstein (2004). Q Rev Biophys 37: 121-46.

DNA replication-timing analysis of human chromosome 22 at high resolution and different developmental states.
EJ White, O Emanuelsson, D Scalzo, T Royce, S Kosak, EJ Oakeley, S Weissman, M Gerstein, M Groudine, M Snyder, D Schubeler (2004). Proc Natl Acad Sci U S A 101: 17771-6.

Fast optimal genome tiling with applications to microarray design and homology search.
P Berman, P Bertone, B Dasgupta, M Gerstein, MY Kao, M Snyder (2004). J Comput Biol 11: 766-85.

Global identification of human transcribed sequences with genome tiling arrays.
P Bertone, V Stolc, TE Royce, JS Rozowsky, AE Urban, X Zhu, JL Rinn, W Tongprasit, M Samanta, S Weissman, M Gerstein, M Snyder (2004). Science 306: 2242-6.

The ENCODE (ENCyclopedia Of DNA Elements) Project.
The ENCODE Project Consortium (2004). Science 306: 636-40.

Information assessment on predicting protein-protein interactions.
N Lin, B Wu, R Jansen, M Gerstein, H Zhao (2004). BMC Bioinformatics 5: 154.

Regulation of gene expression by a metabolic enzyme.
DA Hall, H Zhu, X Zhu, T Royce, M Gerstein, M Snyder (2004). Science 306: 482-4.

Large-scale mutagenesis of the yeast genome using a Tn7-derived multipurpose transposon.
A Kumar, M Seringhaus, MC Biery, RJ Sarnovsky, L Umansky, S Piccirillo, M Heidtman, KH Cheung, CJ Dobry, MB Gerstein, NL Craig, M Snyder (2004). Genome Res 14: 1975-86.

Analyzing protein function on a genomic scale: the importance of gold-standard positives and negatives for network prediction.
R Jansen, M Gerstein (2004). Curr Opin Microbiol 7: 535-45.

Genomic analysis of regulatory network dynamics reveals large topological changes.
NM Luscombe, MM Babu, H Yu, M Snyder, SA Teichmann, M Gerstein (2004). Nature 431: 308-12.

Comprehensive analysis of pseudogenes in prokaryotes: widespread gene decay and failure of putative horizontally transferred genes.
Y Liu, PM Harrison, V Kunin, M Gerstein (2004). Genome Biol 5: R64.

Large-scale analysis of pseudogenes in the human genome.
Z Zhang, M Gerstein (2004). Curr Opin Genet Dev 14: 328-35.

The protein target list of the Northeast Structural Genomics Consortium.
Z Wunderlich, TB Acton, J Liu, G Kornhaber, J Everett, P Carter, N Lan, N Echols, M Gerstein, B Rost, GT Montelione (2004). Proteins 56: 181-7.

Structure and evolution of transcriptional regulatory networks.
MM Babu, NM Luscombe, L Aravind, M Gerstein, SA Teichmann (2004). Curr Opin Struct Biol 14: 283-91.

Analyzing cellular biochemistry in terms of molecular networks.
Y Xia, H Yu, R Jansen, M Seringhaus, S Baxter, D Greenbaum, H Zhao, M Gerstein (2004). Annu Rev Biochem 73: 1051-87.

Major molecular differences between mammalian sexes are involved in drug metabolism and renal function.
JL Rinn, JS Rozowsky, IJ Laurenzi, PH Petersen, K Zou, W Zhong, M Gerstein, M Snyder (2004). Dev Cell 6: 791-800.

Computer security in academia-a potential roadblock to distributed annotation of the human genome.
D Greenbaum, SM Douglas, A Smith, J Lim, M Fischer, M Schultz, M Gerstein (2004). Nat Biotechnol 22: 771-2.

Annotation transfer between genomes: protein-protein interologs and protein-DNA regulogs.
H Yu, NM Luscombe, HX Lu, X Zhu, Y Xia, JD Han, N Bertin, S Chung, M Vidal, M Gerstein (2004). Genome Res 14: 1107-18.

Genomic analysis of essentiality within protein networks.
H Yu, D Greenbaum, H Xin Lu, X Zhu, M Gerstein (2004). Trends Genet 20: 227-31.

Selection and characterization of small random transmembrane proteins that bind and activate the platelet-derived growth factor beta receptor.
LL Freeman-Cook, AM Dixon, JB Frank, Y Xia, L Ely, M Gerstein, DM Engelman, D DiMaio (2004). J Mol Biol 338: 907-20.

Conformational changes associated with protein-protein interactions.
CS Goh, D Milburn, M Gerstein (2004). Curr Opin Struct Biol 14: 104-9.

CREB binds to multiple loci on human chromosome 22.
G Euskirchen, TE Royce, P Bertone, R Martone, JL Rinn, FK Nelson, F Sayward, NM Luscombe, P Miller, M Gerstein, S Weissman, M Snyder (2004). Mol Cell Biol 24: 3804-14.

A method using active-site sequence conservation to find functional shifts in protein families: application to the enzymes of central metabolism, leading to the identification of an anomalous isocitrate dehydrogenase in pathogens.
R Das, M Gerstein (2004). Proteins 55: 455-63.

Exploring the range of protein flexibility, from a structural proteomics perspective.
M Gerstein, N Echols (2004). Curr Opin Chem Biol 8: 14-9.

Transmembrane protein domains rarely use covalent domain recombination as an evolutionary mechanism.
Y Liu, M Gerstein, DM Engelman (2004). Proc Natl Acad Sci U S A 101: 3495-7.

Comparative analysis of processed pseudogenes in the mouse and human genomes.
Z Zhang, N Carriero, M Gerstein (2004). Trends Genet 20: 62-7.

Mining the structural genomics pipeline: identification of protein properties that affect high-throughput experimental analysis.
CS Goh, N Lan, SM Douglas, B Wu, N Echols, A Smith, D Milburn, GT Montelione, H Zhao, M Gerstein (2004). J Mol Biol 336: 115-30.

TopNet: a tool for comparing biological sub-networks, correlating protein properties with topological statistics.
H Yu, X Zhu, D Greenbaum, J Karro, M Gerstein (2004). Nucleic Acids Res 32: 328-37.

Using 3D Hidden Markov Models that explicitly represent spatial coordinates to model and compare protein structures.
V Alexandrov, M Gerstein (2004). BMC Bioinformatics 5: 2.

A map of the interactome network of the metazoan C. elegans.
S Li, CM Armstrong, N Bertin, H Ge, S Milstein, M Boxem, PO Vidalain, JD Han, A Chesneau, T Hao, DS Goldberg, N Li, M Martinez, JF Rual, P Lamesch, L Xu, M Tewari, SL Wong, LV Zhang, GF Berriz, L Jacotot, P Vaglio, J Reboul, T Hirozane-Kishikawa, Q Li, HW Gabel, A Elewa, B Baumgartner, DJ Rose, H Yu, S Bosak, R Sequerra, A Fraser, SE Mango, WM Saxton, S Strome, S Van Den Heuvel, F Piano, J Vandenhaute, C Sardet, M Gerstein, L Doucette-Stamm, KC Gunsalus, JW Harper, ME Cusick, FP Roth, DE Hill, M Vidal (2004). Science 303: 540-3.

Return to front page