Genomics: Analysis of Pseudogenes and Intergenic Regions

We were the first group to assign pseudogenes comprehensively to the human genome and, for comparison, to the genomes of other organisms. Collectively, these studies enable us to determine the common "pseudofamilies" in various genomes and to address important evolutionary questions about the type of proteins that were present in the past history of an organism. In particular, they enabled us to show that there are dramatic differences in the repertoire of pseudogenes in the human genome versus those of other organisms with the human having many more processed pseudogenes (associated with highly transcribed genes such as those of the ribosome), whereas the genomes of other organisms have many more pseudogenes associated with environmental response proteins and horizontal transfer events. Our large scale assignment of pseudogenes also enabled us to precisely calibrate neutral rates of mutation in the genome. Finally, we were able to couple our pseudogene assignments with results of tiling array experiments probing the activity of intergenic regions. These studies enabled us to suggest that pseudogenes might not actually be dead at all, but that many of them are quite alive and actively transcribed. Our work with pseudogenes, and tiling arrays, and intergenic analysis has required the development of many novel technologies such as automatic assignment pipelines and statistical scoring schemes.


Comparative analysis of pseudogenes across three phyla.
C Sisu, B Pei, J Leng, A Frankish, Y Zhang, S Balasubramanian, R Harte, D Wang, M Rutenberg-Schoenberg, W Clark, M Diekhans, J Rozowsky, T Hubbard, J Harrow, MB Gerstein (2014). Proc Natl Acad Sci U S A 111: 13361-6.

Analysis of variable retroduplications in human populations suggests coupling of retrotransposition to cell division.
A Abyzov, R Iskow, O Gokcumen, DW Radke, S Balasubramanian, B Pei, L Habegger, 1000 Genomes Project Consortium, C Lee, M Gerstein (2013). Genome Res 23: 2042-52.

The GENCODE pseudogene resource.
B Pei, C Sisu, A Frankish, C Howald, L Habegger, XJ Mu, R Harte, S Balasubramanian, A Tanzer, M Diekhans, A Reymond, TJ Hubbard, J Harrow, MB Gerstein (2012). Genome Biol 13: R51.

Gene inactivation and its implications for annotation in the era of personal genomics.
S Balasubramanian, L Habegger, A Frankish, DG MacArthur, R Harte, C Tyler-Smith, J Harrow, M Gerstein (2011). Genes Dev 25: 1-10.

Segmental duplications in the human genome reveal details of pseudogene formation.
E Khurana, HY Lam, C Cheng, N Carriero, P Cayting, MB Gerstein (2010). Nucleic Acids Res 38: 6997-7007.

Using semantic web rules to reason on an ontology of pseudogenes.
ME Holford, E Khurana, KH Cheung, M Gerstein (2010). Bioinformatics 26: i71-8.

Identification and analysis of unitary pseudogenes: historic and contemporary gene losses in humans and other primates.
ZD Zhang, A Frankish, T Hunt, J Harrow, M Gerstein (2010). Genome Biol 11: R26.

Comprehensive analysis of the pseudogenes of glycolytic enzymes in vertebrates: the anomalously high number of GAPDH pseudogenes highlights a recent burst of retrotrans-positional activity.
YJ Liu, D Zheng, S Balasubramanian, N Carriero, E Khurana, R Robilotto, MB Gerstein (2009). BMC Genomics 10: 480.

Small RNAs originated from pseudogenes: cis- or trans-acting?
X Guo, Z Zhang, MB Gerstein, D Zheng (2009). PLoS Comput Biol 5: e1000449.

Comparative analysis of processed ribosomal protein pseudogenes in four mammalian genomes.
S Balasubramanian, D Zheng, YJ Liu, G Fang, A Frankish, N Carriero, R Robilotto, P Cayting, M Gerstein (2009). Genome Biol 10: R2.

Pseudofam: the pseudogene families database.
HY Lam, E Khurana, G Fang, P Cayting, N Carriero, KH Cheung, MB Gerstein (2009). Nucleic Acids Res 37: D738-43.

Genomics: protein fossils live on as RNA.
R Sasidharan, M Gerstein (2008). Nature 453: 729-31.

Analysis of nuclear receptor pseudogenes in vertebrates: how the silent tell their stories.
ZD Zhang, P Cayting, G Weinstock, M Gerstein (2008). Mol Biol Evol 25: 131-43.

Pseudogenes in the ENCODE regions: consensus annotation, analysis of transcription, and evolution.
D Zheng, A Frankish, R Baertsch, P Kapranov, A Reymond, SW Choo, Y Lu, F Denoeud, SE Antonarakis, M Snyder, Y Ruan, CL Wei, TR Gingeras, R Guigo, J Harrow, MB Gerstein (2007). Genome Res 17: 839-51.

The ambiguous boundary between genes and pseudogenes: the dead rise up, or do they?
D Zheng, MB Gerstein (2007). Trends Genet 23: 219-24.

Pseudogene.org: a comprehensive database and comparison platform for pseudogene annotation.
JE Karro, Y Yan, D Zheng, Z Zhang, N Carriero, P Cayting, P Harrrison, M Gerstein (2007). Nucleic Acids Res 35: D55-60.

A computational approach for identifying pseudogenes in the ENCODE regions.
D Zheng, MB Gerstein (2006). Genome Biol 7 Suppl 1: S131-10.

The real life of pseudogenes.
M Gerstein, D Zheng (2006). Sci Am 295: 48-55.

PseudoPipe: an automated pseudogene identification pipeline.
Z Zhang, N Carriero, D Zheng, J Karro, PM Harrison, M Gerstein (2006). Bioinformatics 22: 1437-9.

Integrated pseudogene annotation for human chromosome 22: evidence for transcription.
D Zheng, Z Zhang, PM Harrison, J Karro, N Carriero, M Gerstein (2005). J Mol Biol 349: 27-45.

Transcribed processed pseudogenes in the human genome: an intermediate form of expressed retrosequence lacking protein-coding ability.
PM Harrison, D Zheng, Z Zhang, N Carriero, M Gerstein (2005). Nucleic Acids Res 33: 2374-83.

Comprehensive analysis of pseudogenes in prokaryotes: widespread gene decay and failure of putative horizontally transferred genes.
Y Liu, PM Harrison, V Kunin, M Gerstein (2004). Genome Biol 5: R64.

Large-scale analysis of pseudogenes in the human genome.
Z Zhang, M Gerstein (2004). Curr Opin Genet Dev 14: 328-35.

Comparative analysis of processed pseudogenes in the mouse and human genomes.
Z Zhang, N Carriero, M Gerstein (2004). Trends Genet 20: 62-7.

Millions of years of evolution preserved: a comprehensive catalog of the processed pseudogenes in the human genome.
Z Zhang, PM Harrison, Y Liu, M Gerstein (2003). Genome Res 13: 2541-58.

A "polyORFomic" analysis of prokaryote genomes using disabled-homology filtering reveals conserved but undiscovered short ORFs.
PM Harrison, N Carriero, Y Liu, M Gerstein (2003). J Mol Biol 333: 885-92.

Patterns of nucleotide substitution, insertion and deletion in the human genome inferred from pseudogenes.
Z Zhang, M Gerstein (2003). Nucleic Acids Res 31: 5338-48.

The human genome has 49 cytochrome c pseudogenes, including a relic of a primordial gene that still functions in mouse.
Z Zhang, M Gerstein (2003). Gene 312: 61-72.

Identification and characterization of over 100 mitochondrial ribosomal protein pseudogenes in the human genome.
Z Zhang, M Gerstein (2003). Genomics 81: 468-80.

Identification of pseudogenes in the Drosophila melanogaster genome.
PM Harrison, D Milburn, Z Zhang, P Bertone, M Gerstein (2003). Nucleic Acids Res 31: 1033-7.

A small reservoir of disabled ORFs in the yeast genome and its implications for the dynamics of proteome evolution.
P Harrison, A Kumar, N Lan, N Echols, M Snyder, M Gerstein (2002). J Mol Biol 316: 409-19.

Studying genomes through the aeons: protein families, pseudogenes and proteome evolution.
PM Harrison, M Gerstein (2002). J Mol Biol 318: 1155-74.

Molecular fossils in the human genome: identification and analysis of the pseudogenes in chromosomes 21 and 22.
PM Harrison, H Hegyi, S Balasubramanian, NM Luscombe, P Bertone, N Echols, T Johnson, M Gerstein (2002). Genome Res 12: 272-80.

Digging deep for ancient relics: a survey of protein motifs in the intergenic sequences of four eukaryotic genomes.
ZL Zhang, PM Harrison, M Gerstein (2002). J Mol Biol 323: 811-22.

Identification and analysis of over 2000 ribosomal protein pseudogenes in the human genome.
Z Zhang, P Harrison, M Gerstein (2002). Genome Res 12: 1466-82.

SNPs on human chromosomes 21 and 22 -- analysis in terms of protein features and pseudogenes.
S Balasubramanian, P Harrison, H Hegyi, P Bertone, N Luscombe, N Echols, P McGarvey, Z Zhang, M Gerstein (2002). Pharmacogenomics 3: 393-402.

Comprehensive analysis of amino acid and nucleotide composition in eukaryotic genomes, comparing genes and pseudogenes.
N Echols, P Harrison, S Balasubramanian, NM Luscombe, P Bertone, Z Zhang, M Gerstein (2002). Nucleic Acids Res 30: 2515-23.

Digging for dead genes: an analysis of the characteristics of the pseudogene population in the Caenorhabditis elegans genome.
PM Harrison, N Echols, MB Gerstein (2001). Nucleic Acids Res 29: 818-30.


Return to front page