Gerstein lab contribution to center of excellence in genomic sciences (CEGS).

P50 HG02357-01 (PI Snyder)
9/1/01 - 8/31/06
Human Genome Array: Technology for Functional Analysis
Role: co-PI and informatics co-director

The overall grant is funding for the Yale CEGS (center of excellence in genomic sciences) focused on building human genome arrays (for not just genes but whole chromosomes) and developing novel technologies using these arrays for functional analysis of the human genome. The Gerstein lab contribution is to construct computational tools for designing and building the arrays, analyzing the resulting data, and integrating this with other genomic information.

Further information about the center can be found at

Related material:

Year 1 report [ Not available ]
Year 2 report [ html ]
Year 3 report [ html ]
Year 4 report [ html ]
Year 6 report [ html ]
Year 7 report [ html ]
Year 7 Meeting Abstract [ html ]

Recent Yale CEGS tools and datasets available on the web (17-Sep-08) [ html ]

Articles funded by this grant:
AlleleSeq: analysis of allele-specific expression and binding in a network framework.
J Rozowsky, A Abyzov, J Wang, P Alves, D Raha, A Harmanci, J Leng, R Bjornson, Y Kong, N Kitabayashi, N Bhardwaj, M Rubin, M Snyder, M Gerstein (2011). Mol Syst Biol 7: 522.

Identification of genomic indels and structural variations using split reads.
ZD Zhang, J Du, H Lam, A Abyzov, AE Urban, M Snyder, M Gerstein (2011). BMC Genomics 12: 375.

Analysis of genomic variation in non-coding elements using population-scale sequencing data from the 1000 Genomes Project.
XJ Mu, ZJ Lu, Y Kong, HY Lam, MB Gerstein (2011). Nucleic Acids Res 39: 7058-76.

ACT: aggregation and correlation toolbox for analyses of genome tracks.
J Jee, J Rozowsky, KY Yip, L Lochovsky, R Bjornson, G Zhong, Z Zhang, Y Fu, J Wang, Z Weng, M Gerstein (2011). Bioinformatics 27: 1152-4.

CNVnator: an approach to discover, genotype, and characterize typical and atypical CNVs from family and population genome sequencing.
A Abyzov, AE Urban, M Snyder, M Gerstein (2011). Genome Res 21: 974-84.

AGE: defining breakpoints of genomic structural variants at single-nucleotide resolution, through optimal alignments with gap excision.
A Abyzov, M Gerstein (2011). Bioinformatics 27: 595-603.

RSEQtools: a modular framework to analyze RNA-Seq data using compact, anonymized data summaries.
L Habegger, A Sboner, TA Gianoulis, J Rozowsky, A Agarwal, M Snyder, M Gerstein (2011). Bioinformatics 27: 281-3.

Analysis of diverse regulatory networks in a hierarchical context shows consistent tendencies for collaboration in the middle levels.
N Bhardwaj, KK Yan, MB Gerstein (2010). Proc Natl Acad Sci U S A 107: 6841-6.

Nucleotide-resolution analysis of structural variants using BreakSeq and a breakpoint library.
HY Lam, XJ Mu, AM Stutz, A Tanzer, PD Cayting, M Snyder, PM Kim, JO Korbel, MB Gerstein (2010). Nat Biotechnol 28: 47-55.

The relationship between the evolution of microRNA targets and the length of their UTRs.
C Cheng, N Bhardwaj, M Gerstein (2009). BMC Genomics 10: 431.

Integrated assessment of genomic correlates of protein evolutionary rate.
Y Xia, EA Franzosa, MB Gerstein (2009). PLoS Comput Biol 5: e1000413.

Personal phenotypes to go with personal genomes.
M Snyder, S Weissman, M Gerstein (2009). Mol Syst Biol 5: 273.

Quantifying environmental adaptation of metabolic pathways in metagenomics.
TA Gianoulis, J Raes, PV Patel, R Bjornson, JO Korbel, I Letunic, T Yamada, A Paccanaro, LJ Jensen, M Snyder, P Bork, MB Gerstein (2009). Proc Natl Acad Sci U S A 106: 1374-9.

Efficient yeast ChIP-Seq using multiplex short-read DNA sequencing.
P Lefrancois, GM Euskirchen, RK Auerbach, J Rozowsky, T Gibson, CM Yellman, M Gerstein, M Snyder (2009). BMC Genomics 10: 37.

A myelopoiesis-associated regulatory intergenic noncoding RNA transcript within the human HOXA cluster.
X Zhang, Z Lian, C Padden, MB Gerstein, J Rozowsky, M Snyder, TR Gingeras, P Kapranov, SM Weissman, PE Newburger (2009). Blood 113: 2526-34.

MSB: a mean-shift-based approach for the analysis of structural variation in the genome.
LY Wang, A Abyzov, JO Korbel, M Snyder, M Gerstein (2009). Genome Res 19: 106-17.

RNA-Seq: a revolutionary tool for transcriptomics.
Z Wang, M Gerstein, M Snyder (2009). Nat Rev Genet 10: 57-63.

High-resolution copy-number variation map reflects human olfactory receptor diversity and evolution.
Y Hasin, T Olender, M Khen, C Gonzaga-Jauregui, PM Kim, AE Urban, M Snyder, MB Gerstein, D Lancet, JO Korbel (2008). PLoS Genet 4: e1000249.

Analysis of copy number variants and segmental duplications in the human genome: Evidence for a change in the process of formation in recent evolutionary history.
PM Kim, HY Lam, AE Urban, JO Korbel, J Affourtit, F Grubert, X Chen, S Weissman, M Snyder, MB Gerstein (2008). Genome Res 18: 1865-74.

The transcriptional landscape of the yeast genome defined by RNA sequencing.
U Nagalakshmi, Z Wang, K Waern, C Shou, D Raha, M Gerstein, M Snyder (2008). Science 320: 1344-9.

Rapid evolution by positive Darwinian selection in T-cell antigen CD4 in primates.
ZD Zhang, G Weinstock, M Gerstein (2008). J Mol Evol 66: 446-56.

An integrated system for studying residue coevolution in proteins.
KY Yip, P Patel, PM Kim, DM Engelman, D McDermott, M Gerstein (2008). Bioinformatics 24: 290-2.

Positive selection at the protein network periphery: evaluation in terms of structural constraints and cellular context.
PM Kim, JO Korbel, MB Gerstein (2007). Proc Natl Acad Sci U S A 104: 20274-9.

Paired-end mapping reveals extensive structural variation in the human genome.
JO Korbel, AE Urban, JP Affourtit, B Godwin, F Grubert, JF Simons, PM Kim, D Palejev, NJ Carriero, L Du, BE Taillon, Z Chen, A Tanzer, AC Saunders, J Chi, F Yang, NP Carter, ME Hurles, SM Weissman, TT Harkins, MB Gerstein, M Egholm, M Snyder (2007). Science 318: 420-6.

Toward a universal microarray: prediction of gene expression through nearest-neighbor probe sequence identification.
TE Royce, JS Rozowsky, MB Gerstein (2007). Nucleic Acids Res 35: e99.

An efficient pseudomedian filter for tiling microrrays.
TE Royce, NJ Carriero, MB Gerstein (2007). BMC Bioinformatics 8: 186.

Systematic prediction and validation of breakpoints associated with copy-number variants in the human genome.
JO Korbel, AE Urban, F Grubert, J Du, TE Royce, P Starr, G Zhong, BS Emanuel, SM Weissman, M Snyder, MB Gerstein (2007). Proc Natl Acad Sci U S A 104: 10110-5.

Total ancestry measure: quantifying the similarity in tree-like classification, with genomic applications.
H Yu, R Jansen, G Stolovitzky, M Gerstein (2007). Bioinformatics 23: 2163-73.

Assessing the need for sequence-based normalization in tiling microarray experiments.
TE Royce, JS Rozowsky, MB Gerstein (2007). Bioinformatics 23: 988-97.

The ambiguous boundary between genes and pseudogenes: the dead rise up, or do they?
D Zheng, MB Gerstein (2007). Trends Genet 23: 219-24.

New insights into Acinetobacter baumannii pathogenesis revealed by high-density pyrosequencing and transposon mutagenesis.
MG Smith, TA Gianoulis, S Pukatzki, JJ Mekalanos, LN Ornston, M Gerstein, M Snyder (2007). Genes Dev 21: 601-14.

Comparative analysis of genome tiling array data reveals many novel primate-specific functional RNAs in human.
Z Zhang, AW Pang, M Gerstein (2007). BMC Evol Biol 7 Suppl 1: S14.

Positional artifacts in microarrays: experimental verification and construction of COP, an automated detection tool.
H Yu, K Nguyen, T Royce, J Qian, K Nelson, M Snyder, M Gerstein (2007). Nucleic Acids Res 35: e8. a comprehensive database and comparison platform for pseudogene annotation.
JE Karro, Y Yan, D Zheng, Z Zhang, N Carriero, P Cayting, P Harrrison, M Gerstein (2007). Nucleic Acids Res 35: D55-60.

Genomic analysis of the hierarchical structure of regulatory networks.
H Yu, M Gerstein (2006). Proc Natl Acad Sci U S A 103: 14724-31.

Extrapolating traditional DNA microarray statistics to tiling and protein microarray technologies.
TE Royce, JS Rozowsky, NM Luscombe, O Emanuelsson, H Yu, X Zhu, M Snyder, MB Gerstein (2006). Methods Enzymol 411: 282-311.

Predicting essential genes in fungal genomes.
M Seringhaus, A Paccanaro, A Borneman, M Snyder, M Gerstein (2006). Genome Res 16: 1126-35.

The real life of pseudogenes.
M Gerstein, D Zheng (2006). Sci Am 295: 48-55.

PseudoPipe: an automated pseudogene identification pipeline.
Z Zhang, N Carriero, D Zheng, J Karro, PM Harrison, M Gerstein (2006). Bioinformatics 22: 1437-9.

High-resolution mapping of DNA copy alterations in human chromosome 22 using high-density tiling oligonucleotide arrays.
AE Urban, JO Korbel, R Selzer, T Richmond, A Hacker, GV Popescu, JF Cubells, R Green, BS Emanuel, MB Gerstein, SM Weissman, M Snyder (2006). Proc Natl Acad Sci U S A 103: 4534-9.

Design optimization methods for genomic DNA tiling arrays.
P Bertone, V Trifonov, JS Rozowsky, F Schubert, O Emanuelsson, J Karro, MY Kao, M Snyder, M Gerstein (2006). Genome Res 16: 271-81.

Global changes in STAT target selection and transcription regulation upon interferon treatments.
SE Hartman, P Bertone, AK Nath, TE Royce, M Gerstein, S Weissman, M Snyder (2005). Genes Dev 19: 2953-68.

Network security and data integrity in academia: an assessment and a proposal for large-scale archiving.
A Smith, D Greenbaum, SM Douglas, M Long, M Gerstein (2005). Genome Biol 6: 119.

Issues in the analysis of oligonucleotide tiling microarrays for transcript mapping.
TE Royce, JS Rozowsky, P Bertone, M Samanta, V Stolc, S Weissman, M Snyder, M Gerstein (2005). Trends Genet 21: 466-75.

Integrated pseudogene annotation for human chromosome 22: evidence for transcription.
D Zheng, Z Zhang, PM Harrison, J Karro, N Carriero, M Gerstein (2005). J Mol Biol 349: 27-45.

Multi-species microarrays reveal the effect of sequence divergence on gene expression profiles.
Y Gilad, SA Rifkin, P Bertone, M Gerstein, KP White (2005). Genome Res 15: 674-80.

Transcribed processed pseudogenes in the human genome: an intermediate form of expressed retrosequence lacking protein-coding ability.
PM Harrison, D Zheng, Z Zhang, N Carriero, M Gerstein (2005). Nucleic Acids Res 33: 2374-83.

The temporal patterning microRNA let-7 regulates several transcription factors at the larval to adult transition in C. elegans.
H Grosshans, T Johnson, KL Reinert, M Gerstein, FJ Slack (2005). Dev Cell 8: 321-30.

A high productivity/low maintenance approach to high-performance computation for biomedicine: four case studies.
N Carriero, MV Osier, KH Cheung, PL Miller, M Gerstein, H Zhao, B Wu, S Rifkin, J Chang, H Zhang, K White, K Williams, M Schultz (2005). J Am Med Inform Assoc 12: 90-8.

DNA replication-timing analysis of human chromosome 22 at high resolution and different developmental states.
EJ White, O Emanuelsson, D Scalzo, T Royce, S Kosak, EJ Oakeley, S Weissman, M Gerstein, M Groudine, M Snyder, D Schubeler (2004). Proc Natl Acad Sci U S A 101: 17771-6.

Fast optimal genome tiling with applications to microarray design and homology search.
P Berman, P Bertone, B Dasgupta, M Gerstein, MY Kao, M Snyder (2004). J Comput Biol 11: 766-85.

Global identification of human transcribed sequences with genome tiling arrays.
P Bertone, V Stolc, TE Royce, JS Rozowsky, AE Urban, X Zhu, JL Rinn, W Tongprasit, M Samanta, S Weissman, M Gerstein, M Snyder (2004). Science 306: 2242-6.

Large-scale analysis of pseudogenes in the human genome.
Z Zhang, M Gerstein (2004). Curr Opin Genet Dev 14: 328-35.

Major molecular differences between mammalian sexes are involved in drug metabolism and renal function.
JL Rinn, JS Rozowsky, IJ Laurenzi, PH Petersen, K Zou, W Zhong, M Gerstein, M Snyder (2004). Dev Cell 6: 791-800.

CREB binds to multiple loci on human chromosome 22.
G Euskirchen, TE Royce, P Bertone, R Martone, JL Rinn, FK Nelson, F Sayward, NM Luscombe, P Miller, M Gerstein, S Weissman, M Snyder (2004). Mol Cell Biol 24: 3804-14.

Comparative analysis of processed pseudogenes in the mouse and human genomes.
Z Zhang, N Carriero, M Gerstein (2004). Trends Genet 20: 62-7.

Identification of novel functional elements in the human genome.
Z Lian, G Euskirchen, J Rinn, R Martone, P Bertone, S Hartman, T Royce, K Nelson, F Sayward, N Luscombe, J Yang, JL Li, P Miller, AE Urban, M Gerstein, S Weissman, M Snyder (2003). Cold Spring Harb Symp Quant Biol 68: 317-22.

Millions of years of evolution preserved: a comprehensive catalog of the processed pseudogenes in the human genome.
Z Zhang, PM Harrison, Y Liu, M Gerstein (2003). Genome Res 13: 2541-58.

A "polyORFomic" analysis of prokaryote genomes using disabled-homology filtering reveals conserved but undiscovered short ORFs.
PM Harrison, N Carriero, Y Liu, M Gerstein (2003). J Mol Biol 333: 885-92.

Prediction of regulatory networks: genome-wide identification of transcription factor targets from gene expression data.
J Qian, J Lin, NM Luscombe, H Yu, M Gerstein (2003). Bioinformatics 19: 1917-26.

Distribution of NF-kappaB-binding sites across human chromosome 22.
R Martone, G Euskirchen, P Bertone, S Hartman, TE Royce, NM Luscombe, JL Rinn, FK Nelson, P Miller, M Gerstein, S Weissman, M Snyder (2003). Proc Natl Acad Sci U S A 100: 12247-52.

Identification and correction of spurious spatial correlations in microarray data.
J Qian, Y Kluger, H Yu, M Gerstein (2003). Biotechniques 35: 42-4, 46, 48.

ExpressYourself: A modular platform for processing and visualizing microarray data.
NM Luscombe, TE Royce, P Bertone, N Echols, CE Horak, JT Chang, M Snyder, M Gerstein (2003). Nucleic Acids Res 31: 3477-82.

Of mice and men: phylogenetic footprinting aids the discovery of regulatory elements.
Z Zhang, M Gerstein (2003). J Biol 2: 11.

Identification and characterization of over 100 mitochondrial ribosomal protein pseudogenes in the human genome.
Z Zhang, M Gerstein (2003). Genomics 81: 468-80.

Genomics. Defining genes in the genomics era.
M Snyder, M Gerstein (2003). Science 300: 258-60.

Spectral biclustering of microarray data: coclustering genes and conditions.
Y Kluger, R Basri, JT Chang, M Gerstein (2003). Genome Res 13: 703-16.

The transcriptional activity of human Chromosome 22.
JL Rinn, G Euskirchen, P Bertone, R Martone, NM Luscombe, S Hartman, PM Harrison, FK Nelson, P Miller, M Gerstein, S Weissman, M Snyder (2003). Genes Dev 17: 529-40.

Identification of pseudogenes in the Drosophila melanogaster genome.
PM Harrison, D Milburn, Z Zhang, P Bertone, M Gerstein (2003). Nucleic Acids Res 31: 1033-7.

Studying genomes through the aeons: protein families, pseudogenes and proteome evolution.
PM Harrison, M Gerstein (2002). J Mol Biol 318: 1155-74.

A question of size: the eukaryotic proteome and the problems in defining it.
PM Harrison, A Kumar, N Lang, M Snyder, M Gerstein (2002). Nucleic Acids Res 30: 1083-90.

Fast optimal genome tiling with applications to microarray design and homology search.
P Berman, P Bertone, B DasGupta, M Gerstein, M-Y Kao, M Snyder (2002). Proceedings of the 2nd International Workshop on Algorithms in Bioinformatics. Springer-Verlag LNCS 2452: 419-433

Complex transcriptional circuitry at the G1/S transition in Saccharomyces cerevisiae.
CE Horak, NM Luscombe, J Qian, P Bertone, S Piccirrillo, M Gerstein, M Snyder (2002). Genes Dev 16: 3017-33.

YMD: a microarray database for large-scale gene expression analysis.
KH Cheung, K White, J Hager, M Gerstein, V Reinke, K Nelson, P Masiar, R Srivastava, Y Li, J Li, H Zhao, J Li, DB Allison, M Snyder, P Miller, K Williams (2002). Proc AMIA Symp : 140-4.

Genomic and proteomic analysis of the myeloid differentiation program: global analysis of gene expression during induced differentiation in the MPRO cell line.
Z Lian, Y Kluger, DS Greenbaum, D Tuck, M Gerstein, N Berliner, SM Weissman, PE Newburger (2002). Blood 100: 3209-20.

Identification and analysis of over 2000 ribosomal protein pseudogenes in the human genome.
Z Zhang, P Harrison, M Gerstein (2002). Genome Res 12: 1466-82.

The current excitement in bioinformatics-analysis of whole-genome expression data: how does it relate to protein structure and function?
M Gerstein, R Jansen (2000). Curr Opin Struct Biol 10: 574-84.

Return to front page