2006-summary

We have continued to make progress in all areas of research as planned. As in previous years, our focus has been on genome annotation, network analysis, structural genomics and macromolecular motions. We will highlight some of the research in the following paragraphs.

In terms of genome annotation, we have extensively refined the pseudogene annotation that we started a few years ago. In Zhang et al. (2006), we refined the pipeline method for in silico identification of pseudogenes. We also classified duplicated pseudogenes based on the intron-exon structure of parent genes ( Zheng and Gerstein, 2006). This approach is different from our earlier methodology. We have identified duplicated pseudogenes in the ENCODE regions. In addition, we also extensively used tiling microarray analyses to augment genome annotation efforts by probing intergenic regions. We briefly describe some of the research that we did in this area. We have developed new methods and optimizations for tiling arrays and critically evaluated existing technology and methods used to process them. Emanuelsson et al. (2006) have assessed the performance of various high density tiling arrays and highlighted significant differences between two widely used high-density platforms. We have also shown that a hidden Markov model framework is able to efficiently process tiling array data as well as or better than previous approaches (Du et al. 2006). We have developed a web resource accessible at http://tiling.gersteinlab.org, to generate optimal tile paths from user-provided DNA sequences (Bertone et al. 2006).

We published several papers on network analysis. We elaborate upon one important paper that appeared in Science (Kim et al. 2006) and a network analysis tool (Yip et al. 2006). Most network analyses are focused on mathematical modeling of networks and theoretical understanding of network parameters. Here, we analyzed protein-protein interaction networks from a biological standpoint which provides insights into evolution of such interactions. Specifically, we looked at proteins in terms of their 3D structures. We showed that the proportion of a protein's surface area available for interaction with other proteins correlates inversely to evolutionary rate. From this analysis, we identified two different types of network edges: a group of interactions associated with multi-interface hubs similar to classical hubs and another set associated with single-interface hubs. We also developed a tool, topnet-like Yale Network Analyzer (tYNA), a Web system for managing, comparing and mining multiple networks, both directed and undirected (Yip et al. 2006). tYNA efficiently implements methods that have proven useful in network analysis, including identifying defective cliques, finding small network motifs, calculating global statistics and identifying hubs and bottlenecks.

As an extension of our work on macromolecular motions, we have developed a comprehensive package of tools for analyzing helix-helix packing in proteins ( Burba et al. 2006). These tools can be used for analysis of individual protein conformations or to gain insight into dynamic changes in helix interactions. For the latter purpose, a direct interface from entries in the Molecular Motions Database to the HIT( helix interaction tool) site has been provided.

Finally, we have published papers relating to how broader issues in computation affect the biological sciences. In this era, huge amounts of data are transmitted quickly via the information super highway. With this changing scenario, we envision the way we think scientific information has to be shared so it can be efficiently and quickly accessed by anybody who seeks it. To this end we published a paper (Seringhaus and Gerstein, 2006) titled 'The death of the scientific paper'. We argue for the dissemination of results in a comprehensive format that is readily accessible to the public, with the click of a mouse, accompanying a traditional scientific article. We also published a letter in Science (Smith and Gerstein, 2006) where we expound on the importance of "web mining"in the new science of web.

Assessing the performance of different high-density tiling microarray strategies for mapping transcribed regions of the human genome.

O Emanuelsson, U Nagalakshmi, D Zheng, JS Rozowsky, AE Urban, J Du, Z Lian, V Stolc, S Weissman, M Snyder, MB Gerstein (2007). Genome Res 17: 886-97.