Gerstein Lab Publications

Main  •  By Subject  •  Queries  •  Code  •  Other Writings


Genomics: Analysis of Pseudogenes and Intergenic Regions

We were the first group to assign pseudogenes comprehensively to the human genome and, for comparison, to the genomes of other organisms. Collectively, these studies enable us to determine the common "pseudofamilies" in various genomes and to address important evolutionary questions about the type of proteins that were present in the past history of an organism. In particular, they enabled us to show that there are dramatic differences in the repertoire of pseudogenes in the human genome versus those of other organisms with the human having many more processed pseudogenes (associated with highly transcribed genes such as those of the ribosome), whereas the genomes of other organisms have many more pseudogenes associated with environmental response proteins and horizontal transfer events. Our large scale assignment of pseudogenes also enabled us to precisely calibrate neutral rates of mutation in the genome. Finally, we were able to couple our pseudogene assignments with results of tiling array experiments probing the activity of intergenic regions. These studies enabled us to suggest that pseudogenes might not actually be dead at all, but that many of them are quite alive and actively transcribed. Our work with pseudogenes, and tiling arrays, and intergenic analysis has required the development of many novel technologies such as automatic assignment pipelines and statistical scoring schemes.


Comparative analysis of pseudogenes across three phyla.
C Sisu, B Pei, J Leng, A Frankish, Y Zhang, S Balasubramanian, R Harte, D Wang, M Rutenberg-Schoenberg, W Clark, M Diekhans, J Rozowsky, T Hubbard, J Harrow, MB Gerstein (2014). Proc Natl Acad Sci U S A 111: 13361-6.
website
 
medline

Analysis of variable retroduplications in human populations suggests coupling of retrotransposition to cell division.
A Abyzov, R Iskow, O Gokcumen, DW Radke, S Balasubramanian, B Pei, L Habegger, The 1000 Genomes Project Consortium, C Lee, M Gerstein (2013). Genome Res 23:2042-52.
 
preprint
medline

The GENCODE pseudogene resource.
B Pei, C Sisu, A Frankish, C Howald, L Habegger, XJ Mu, R Harte, S Balasubramanian, A Tanzer, M Diekhans, A Reymond, TJ Hubbard, J Harrow, MB Gerstein (2012). Genome Biol 13: R51.
website
 
medline

Gene inactivation and its implications for annotation in the era of personal genomics.
S Balasubramanian, L Habegger, A Frankish, DG MacArthur, R Harte, C Tyler-Smith, J Harrow, M Gerstein (2011). Genes Dev 25: 1-10.
 
 
medline

Segmental duplications in the human genome reveal details of pseudogene formation.
E Khurana, HY Lam, C Cheng, N Carriero, P Cayting, MB Gerstein (2010). Nucleic Acids Res 38: 6997-7007.
website
 
medline

Using semantic web rules to reason on an ontology of pseudogenes.
ME Holford, E Khurana, KH Cheung, M Gerstein (2010). Bioinformatics 26: i71-8.
website
 
medline

Identification and analysis of unitary pseudogenes: historic and contemporary gene losses in humans and other primates.
ZD Zhang, A Frankish, T Hunt, J Harrow, M Gerstein (2010). Genome Biol 11: R26.
 
preprint
medline

Comprehensive analysis of the pseudogenes of glycolytic enzymes in vertebrates: the anomalously high number of GAPDH pseudogenes highlights a recent burst of retrotrans-positional activity.
YJ Liu, D Zheng, S Balasubramanian, N Carriero, E Khurana, R Robilotto, MB Gerstein (2009). BMC Genomics 10: 480.
website
preprint
medline

Small RNAs originated from pseudogenes: cis- or trans-acting?
X Guo, Z Zhang, MB Gerstein, D Zheng (2009). PLoS Comput Biol 5: e1000449.
 
preprint
medline

Comparative analysis of processed ribosomal protein pseudogenes in four mammalian genomes.
S Balasubramanian, D Zheng, YJ Liu, G Fang, A Frankish, N Carriero, R Robilotto, P Cayting, M Gerstein (2009). Genome Biol 10: R2.
website
preprint
medline

Pseudofam: the pseudogene families database.
HY Lam, E Khurana, G Fang, P Cayting, N Carriero, KH Cheung, MB Gerstein (2009). Nucleic Acids Res 37: D738-43.
website
preprint
medline

Genomics: protein fossils live on as RNA.
R Sasidharan, M Gerstein (2008). Nature 453: 729-31.
 
preprint
medline

Analysis of nuclear receptor pseudogenes in vertebrates: how the silent tell their stories.
ZD Zhang, P Cayting, G Weinstock, M Gerstein (2008). Mol Biol Evol 25: 131-43.
website
preprint
medline

Pseudogenes in the ENCODE regions: consensus annotation, analysis of transcription, and evolution.
D Zheng, A Frankish, R Baertsch, P Kapranov, A Reymond, SW Choo, Y Lu, F Denoeud, SE Antonarakis, M Snyder, Y Ruan, CL Wei, TR Gingeras, R Guigo, J Harrow, MB Gerstein (2007). Genome Res 17: 839-51.
website
preprint
medline

The ambiguous boundary between genes and pseudogenes: the dead rise up, or do they?
D Zheng, MB Gerstein (2007). Trends Genet 23: 219-24.
website
preprint
medline

Pseudogene.org: a comprehensive database and comparison platform for pseudogene annotation.
JE Karro, Y Yan, D Zheng, Z Zhang, N Carriero, P Cayting, P Harrrison, M Gerstein (2007). Nucleic Acids Res 35: D55-60.
website
preprint
medline

A computational approach for identifying pseudogenes in the ENCODE regions.
D Zheng, MB Gerstein (2006). Genome Biol 7 Suppl 1: S13.1-10.
website
preprint
medline

The real life of pseudogenes.
M Gerstein, D Zheng (2006). Sci Am 295: 48-55.
website
preprint
medline

PseudoPipe: an automated pseudogene identification pipeline.
Z Zhang, N Carriero, D Zheng, J Karro, PM Harrison, M Gerstein (2006). Bioinformatics 22: 1437-9.
website
preprint
medline

Integrated pseudogene annotation for human chromosome 22: evidence for transcription.
D Zheng, Z Zhang, PM Harrison, J Karro, N Carriero, M Gerstein (2005). J Mol Biol 349: 27-45.
website
preprint
medline

Transcribed processed pseudogenes in the human genome: an intermediate form of expressed retrosequence lacking protein-coding ability.
PM Harrison, D Zheng, Z Zhang, N Carriero, M Gerstein (2005). Nucleic Acids Res 33: 2374-83.
website
preprint
medline

Comprehensive analysis of pseudogenes in prokaryotes: widespread gene decay and failure of putative horizontally transferred genes.
Y Liu, PM Harrison, V Kunin, M Gerstein (2004). Genome Biol 5: R64.
website
preprint
medline

Large-scale analysis of pseudogenes in the human genome.
Z Zhang, M Gerstein (2004). Curr Opin Genet Dev 14: 328-35.
website
preprint
medline

Comparative analysis of processed pseudogenes in the mouse and human genomes.
Z Zhang, N Carriero, M Gerstein (2004). Trends Genet 20: 62-7.
website
preprint
medline

Millions of years of evolution preserved: a comprehensive catalog of the processed pseudogenes in the human genome.
Z Zhang, PM Harrison, Y Liu, M Gerstein (2003). Genome Res 13: 2541-58.
website
preprint
medline

A "polyORFomic" analysis of prokaryote genomes using disabled-homology filtering reveals conserved but undiscovered short ORFs.
PM Harrison, N Carriero, Y Liu, M Gerstein (2003). J Mol Biol 333: 885-92.
website
preprint
medline

Patterns of nucleotide substitution, insertion and deletion in the human genome inferred from pseudogenes.
Z Zhang, M Gerstein (2003). Nucleic Acids Res 31: 5338-48.
website
preprint
medline

The human genome has 49 cytochrome c pseudogenes, including a relic of a primordial gene that still functions in mouse.
Z Zhang, M Gerstein (2003). Gene 312: 61-72.
website
preprint
medline

Identification and characterization of over 100 mitochondrial ribosomal protein pseudogenes in the human genome.
Z Zhang, M Gerstein (2003). Genomics 81: 468-80.
website
preprint
medline

Identification of pseudogenes in the Drosophila melanogaster genome.
PM Harrison, D Milburn, Z Zhang, P Bertone, M Gerstein (2003). Nucleic Acids Res 31: 1033-7.
website
preprint
medline

A small reservoir of disabled ORFs in the yeast genome and its implications for the dynamics of proteome evolution.
P Harrison, A Kumar, N Lan, N Echols, M Snyder, M Gerstein (2002). J Mol Biol 316: 409-19.
website
preprint
medline

Studying genomes through the aeons: protein families, pseudogenes and proteome evolution.
PM Harrison, M Gerstein (2002). J Mol Biol 318: 1155-74.
website
preprint
medline

Molecular fossils in the human genome: identification and analysis of the pseudogenes in chromosomes 21 and 22.
PM Harrison, H Hegyi, S Balasubramanian, NM Luscombe, P Bertone, N Echols, T Johnson, M Gerstein (2002). Genome Res 12: 272-80.
website
preprint
medline

Digging deep for ancient relics: a survey of protein motifs in the intergenic sequences of four eukaryotic genomes.
ZL Zhang, PM Harrison, M Gerstein (2002). J Mol Biol 323: 811-22.
website
preprint
medline

Identification and analysis of over 2000 ribosomal protein pseudogenes in the human genome.
Z Zhang, P Harrison, M Gerstein (2002). Genome Res 12: 1466-82.
website
preprint
medline

SNPs on human chromosomes 21 and 22 -- analysis in terms of protein features and pseudogenes.
S Balasubramanian, P Harrison, H Hegyi, P Bertone, N Luscombe, N Echols, P McGarvey, Z Zhang, M Gerstein (2002). Pharmacogenomics 3: 393-402.
website
preprint
medline

Comprehensive analysis of amino acid and nucleotide composition in eukaryotic genomes, comparing genes and pseudogenes.
N Echols, P Harrison, S Balasubramanian, NM Luscombe, P Bertone, Z Zhang, M Gerstein (2002). Nucleic Acids Res 30: 2515-23.
website
preprint
medline

Digging for dead genes: an analysis of the characteristics of the pseudogene population in the Caenorhabditis elegans genome.
PM Harrison, N Echols, MB Gerstein (2001). Nucleic Acids Res 29: 818-30.
website
preprint
medline


Return to front page