We have continued to make progress in all areas of research as planned. As in previous years, our focus has been on genome annotation, network analysis, structural genomics and macromolecular motions. We will highlight some of the research in the following paragraphs.

In terms of genome annotation, we have extensively refined the pseudogene annotation that we started a few years ago. In Zhang et al. (2006), we refined the pipeline method for in silico identification of pseudogenes. We also classified duplicated pseudogenes based on the intron-exon structure of parent genes ( Zheng and Gerstein, 2006). This approach is different from our earlier methodology. We have identified duplicated pseudogenes in the ENCODE regions. In addition, we also extensively used tiling microarray analyses to augment genome annotation efforts by probing intergenic regions. We briefly describe some of the research that we did in this area. We have developed new methods and optimizations for tiling arrays and critically evaluated existing technology and methods used to process them. Emanuelsson et al. (2006) have assessed the performance of various high density tiling arrays and highlighted significant differences between two widely used high-density platforms. We have also shown that a hidden Markov model framework is able to efficiently process tiling array data as well as or better than previous approaches (Du et al. 2006). We have developed a web resource accessible at http://tiling.gersteinlab.org, to generate optimal tile paths from user-provided DNA sequences (Bertone et al. 2006).

We published several papers on network analysis. We elaborate upon one important paper that appeared in Science (Kim et al. 2006) and a network analysis tool (Yip et al. 2006). Most network analyses are focused on mathematical modeling of networks and theoretical understanding of network parameters. Here, we analyzed protein-protein interaction networks from a biological standpoint which provides insights into evolution of such interactions. Specifically, we looked at proteins in terms of their 3D structures. We showed that the proportion of a protein's surface area available for interaction with other proteins correlates inversely to evolutionary rate. From this analysis, we identified two different types of network edges: a group of interactions associated with multi-interface hubs similar to classical hubs and another set associated with single-interface hubs. We also developed a tool, topnet-like Yale Network Analyzer (tYNA), a Web system for managing, comparing and mining multiple networks, both directed and undirected (Yip et al. 2006). tYNA efficiently implements methods that have proven useful in network analysis, including identifying defective cliques, finding small network motifs, calculating global statistics and identifying hubs and bottlenecks.

As an extension of our work on macromolecular motions, we have developed a comprehensive package of tools for analyzing helix-helix packing in proteins ( Burba et al. 2006). These tools can be used for analysis of individual protein conformations or to gain insight into dynamic changes in helix interactions. For the latter purpose, a direct interface from entries in the Molecular Motions Database to the HIT( helix interaction tool) site has been provided.

Finally, we have published papers relating to how broader issues in computation affect the biological sciences. In this era, huge amounts of data are transmitted quickly via the information super highway. With this changing scenario, we envision the way we think scientific information has to be shared so it can be efficiently and quickly accessed by anybody who seeks it. To this end we published a paper (Seringhaus and Gerstein, 2006) titled 'The death of the scientific paper'. We argue for the dissemination of results in a comprehensive format that is readily accessible to the public, with the click of a mouse, accompanying a traditional scientific article. We also published a letter in Science (Smith and Gerstein, 2006) where we expound on the importance of "web mining"in the new science of web.


Assessing the performance of different high-density tiling microarray strategies for mapping transcribed regions of the human genome.
O Emanuelsson, U Nagalakshmi, D Zheng, JS Rozowsky, AE Urban, J Du, Z Lian, V Stolc, S Weissman, M Snyder, MB Gerstein (2007). Genome Res 17: 886-97.

The Death of the Scientific Paper
Seringhaus M, Gerstein M (2006). The Scientist. 20(9): 25

Analytical Evolutionary Model for Protein Fold Occurrence in Genomes, Accounting for the Effects of Gene Duplication, Deletion, Acquisition and Selective Pressure
M Kamal, N Luscombe, J Qian, M Gerstein (2006). in Power Laws, Scale-Free Networks and Genome Biology (edited by EV Koonin, YI Wolf, GP Karev; Springer, New York), pages 165-193

Relating three-dimensional structures to protein networks provides evolutionary insights.
PM Kim, LJ Lu, Y Xia, MB Gerstein (2006). Science 314: 1938-41.

Data mining on the web
A Smith, M Gerstein (2006). Science 314: 1682; author reply 1682.

ProCAT: a data analysis approach for protein microarrays.
X Zhu, M Gerstein, M Snyder (2006). Genome Biol 7: R110.

BoCaTFBS: a boosted cascade learner to refine the binding sites suggested by ChIP-chip experiments.
LY Wang, M Snyder, M Gerstein (2006). Genome Biol 7: R102.

Helix Interaction Tool (HIT): a web-based tool for analysis of helix-helix interactions in proteins.
AE Burba, U Lehnert, EZ Yu, M Gerstein (2006). Bioinformatics 22: 2735-8.

A supervised hidden markov model framework for efficiently segmenting tiling array data in transcriptional and chIP-chip experiments: systematically incorporating validated biological knowledge.
J Du, JS Rozowsky, JO Korbel, ZD Zhang, TE Royce, MH Schultz, M Snyder, M Gerstein (2006). Bioinformatics 22: 3016-24.

The tYNA platform for comparative interactomics: a web tool for managing, comparing and mining multiple networks.
KY Yip, H Yu, PM Kim, M Schultz, M Gerstein (2006). Bioinformatics 22: 2968-70.

Genomic analysis of the hierarchical structure of regulatory networks.
H Yu, M Gerstein (2006). Proc Natl Acad Sci U S A 103: 14724-31.

Extrapolating traditional DNA microarray statistics to tiling and protein microarray technologies.
TE Royce, JS Rozowsky, NM Luscombe, O Emanuelsson, H Yu, X Zhu, M Snyder, MB Gerstein (2006). Methods Enzymol 411: 282-311.

A computational approach for identifying pseudogenes in the ENCODE regions.
D Zheng, MB Gerstein (2006). Genome Biol 7 Suppl 1: S131-10.

Predicting essential genes in fungal genomes.
M Seringhaus, A Paccanaro, A Borneman, M Snyder, M Gerstein (2006). Genome Res 16: 1126-35.

The real life of pseudogenes.
M Gerstein, D Zheng (2006). Sci Am 295: 48-55.

Design principles of molecular networks revealed by global comparisons and composite motifs.
H Yu, Y Xia, V Trifonov, M Gerstein (2006). Genome Biol 7: R55.

The geometry of the ribosomal polypeptide exit tunnel.
NR Voss, M Gerstein, TA Steitz, PB Moore (2006). J Mol Biol 360: 893-906.

Genomic analysis of insertion behavior and target specificity of mini-Tn7 and Tn3 transposons in Saccharomyces cerevisiae.
M Seringhaus, A Kumar, J Hartigan, M Snyder, M Gerstein (2006). Nucleic Acids Res 34: e57.

Tools needed to navigate landscape of the genome
M Gerstein (2006). Nature 440: 740.

PseudoPipe: an automated pseudogene identification pipeline.
Z Zhang, N Carriero, D Zheng, J Karro, PM Harrison, M Gerstein (2006). Bioinformatics 22: 1437-9.

Global landscape of protein complexes in the yeast Saccharomyces cerevisiae.
NJ Krogan, G Cagney, H Yu, G Zhong, X Guo, A Ignatchenko, J Li, S Pu, N Datta, AP Tikuisis, T Punna, JM Peregrin-Alvarez, M Shales, X Zhang, M Davey, MD Robinson, A Paccanaro, JE Bray, A Sheung, B Beattie, DP Richards, V Canadien, A Lalev, F Mena, P Wong, A Starostine, MM Canete, J Vlasblom, S Wu, C Orsi, SR Collins, S Chandran, R Haw, JJ Rilstone, K Gandi, NJ Thompson, G Musso, P St Onge, S Ghanny, MH Lam, G Butland, AM Altaf-Ul, S Kanaya, A Shilatifard, E O'Shea, JS Weissman, CJ Ingles, TR Hughes, J Parkinson, M Gerstein, SJ Wodak, A Emili, JF Greenblatt (2006). Nature 440: 637-43.

High-resolution mapping of DNA copy alterations in human chromosome 22 using high-density tiling oligonucleotide arrays.
AE Urban, JO Korbel, R Selzer, T Richmond, A Hacker, GV Popescu, JF Cubells, R Green, BS Emanuel, MB Gerstein, SM Weissman, M Snyder (2006). Proc Natl Acad Sci U S A 103: 4534-9.

Predicting interactions in protein networks by completing defective cliques.
H Yu, A Paccanaro, V Trifonov, M Gerstein (2006). Bioinformatics 22: 823-9.

Target hub proteins serve as master regulators of development in yeast.
AR Borneman, JA Leigh-Bell, H Yu, P Bertone, M Gerstein, M Snyder (2006). Genes Dev 20: 435-48.

Integrated prediction of the helical membrane protein interactome in yeast.
Y Xia, LJ Lu, M Gerstein (2006). J Mol Biol 357: 339-49.

The Database of Macromolecular Motions: new features added at the decade mark.
S Flores, N Echols, D Milburn, B Hespenheide, K Keating, J Lu, S Wells, EZ Yu, M Thorpe, M Gerstein (2006). Nucleic Acids Res 34: D296-301.

Design optimization methods for genomic DNA tiling arrays.
P Bertone, V Trifonov, JS Rozowsky, F Schubert, O Emanuelsson, J Karro, MY Kao, M Snyder, M Gerstein (2006). Genome Res 16: 271-81.


Return to front page