These constitute a good selection of papers to introduce one to the lab and the general field of bioinformatics.

One might start with Luscombe et al. (2002) for an introduction to bioinformatics.

Specific fields are highlighted in other short papers:

Data integration for function prediction
Gerstein et al. (2002), Bertone and Gerstein (2001), Greenbaum et al. (2001)
Simulation on molecular structure
Gerstein and Levitt (1998); Gerstein and Chothia (1999)
Pseudogenes and genome annotation
Gerstein and Snyder (2003); Harrison and Gerstein (2002)
Structural genomics
Gerstein (2000); Teichmann et al. (1999)
E-publishing
Gerstein (1999)
Expression analysis
Gerstein and Jansen (2000)
In each list, it is probably best to read the papers listed first.

Unfortunately, none of these papers give one much detail on the mathematical or computational aspects of the work. For this, it's best to look at some samples of recent research, which are listed below, to get a sense of the type of mining that we are doing -- e.g. Yu & Gerstein (2006), Du et al. (2006), Lu et al., (2005), Yip et al., (2007), and Echols et al. (2003).


Architecture of the human regulatory network derived from ENCODE data.
MB Gerstein, A Kundaje, M Hariharan, SG Landt, KK Yan, C Cheng, XJ Mu, E Khurana, J Rozowsky, R Alexander, R Min, P Alves, A Abyzov, N Addleman, N Bhardwaj, AP Boyle, P Cayting, A Charos, DZ Chen, Y Cheng, D Clarke, C Eastman, G Euskirchen, S Frietze, Y Fu, J Gertz, F Grubert, A Harmanci, P Jain, M Kasowski, P Lacroute, JJ Leng, J Lian, H Monahan, H O'Geen, Z Ouyang, EC Partridge, D Patacsil, F Pauli, D Raha, L Ramirez, TE Reddy, B Reed, M Shi, T Slifer, J Wang, L Wu, X Yang, KY Yip, G Zilberman-Schapira, S Batzoglou, A Sidow, PJ Farnham, RM Myers, SM Weissman, M Snyder (2012). Nature 489: 91-100.

A systematic survey of loss-of-function variants in human protein-coding genes.
DG MacArthur, S Balasubramanian, A Frankish, N Huang, J Morris, K Walter, L Jostins, L Habegger, JK Pickrell, SB Montgomery, CA Albers, ZD Zhang, DF Conrad, G Lunter, H Zheng, Q Ayub, MA DePristo, E Banks, M Hu, RE Handsaker, JA Rosenfeld, M Fromer, M Jin, XJ Mu, E Khurana, K Ye, M Kay, GI Saunders, MM Suner, T Hunt, IH Barnes, C Amid, DR Carvalho-Silva, AH Bignell, C Snow, B Yngvadottir, S Bumpstead, DN Cooper, Y Xue, IG Romero, 1000 Genomes Project Consortium, J Wang, Y Li, RA Gibbs, SA McCarroll, ET Dermitzakis, JK Pritchard, JC Barrett, J Harrow, ME Hurles, MB Gerstein, C Tyler-Smith (2012). Science 335: 823-8.

Novel insights through the integration of structural and functional genomics data with protein networks.
D Clarke, N Bhardwaj, MB Gerstein (2012). J Struct Biol 179: 320-6.

AlleleSeq: analysis of allele-specific expression and binding in a network framework.
J Rozowsky, A Abyzov, J Wang, P Alves, D Raha, A Harmanci, J Leng, R Bjornson, Y Kong, N Kitabayashi, N Bhardwaj, M Rubin, M Snyder, M Gerstein (2011). Mol Syst Biol 7: 522.

Analysis of genomic variation in non-coding elements using population-scale sequencing data from the 1000 Genomes Project.
XJ Mu, ZJ Lu, Y Kong, HY Lam, MB Gerstein (2011). Nucleic Acids Res 39: 7058-76.

Gene inactivation and its implications for annotation in the era of personal genomics.
S Balasubramanian, L Habegger, A Frankish, DG MacArthur, R Harte, C Tyler-Smith, J Harrow, M Gerstein (2011). Genes Dev 25: 1-10.

Getting started in gene orthology and functional analysis.
G Fang, N Bhardwaj, R Robilotto, MB Gerstein (2010). PLoS Comput Biol 6: e1000703.

RigidFinder: a fast and sensitive method to detect rigid blocks in large macromolecular complexes.
A Abyzov, R Bjornson, M Felipe, M Gerstein (2010). Proteins 78: 309-24.

Understanding modularity in molecular networks requires dynamics.
RP Alexander, PM Kim, T Emonet, MB Gerstein (2009). Sci Signal 2: pe44.

Systematic identification of transcription factors associated with patient survival in cancers.
C Cheng, LM Li, P Alves, M Gerstein (2009). BMC Genomics 10: 225.

Comparative analysis of processed ribosomal protein pseudogenes in four mammalian genomes.
S Balasubramanian, D Zheng, YJ Liu, G Fang, A Frankish, N Carriero, R Robilotto, P Cayting, M Gerstein (2009). Genome Biol 10: R2.

PeakSeq enables systematic scoring of ChIP-seq experiments relative to controls.
J Rozowsky, G Euskirchen, RK Auerbach, ZD Zhang, T Gibson, R Bjornson, N Carriero, M Snyder, MB Gerstein (2009). Nat Biotechnol 27: 66-75.

Pseudofam: the pseudogene families database.
HY Lam, E Khurana, G Fang, P Cayting, N Carriero, KH Cheung, MB Gerstein (2009). Nucleic Acids Res 37: D738-43.

Genomics: protein fossils live on as RNA.
R Sasidharan, M Gerstein (2008). Nature 453: 729-31.

A supervised hidden markov model framework for efficiently segmenting tiling array data in transcriptional and chIP-chip experiments: systematically incorporating validated biological knowledge.
J Du, JS Rozowsky, JO Korbel, ZD Zhang, TE Royce, MH Schultz, M Snyder, M Gerstein (2006). Bioinformatics 22: 3016-24.

The tYNA platform for comparative interactomics: a web tool for managing, comparing and mining multiple networks.
KY Yip, H Yu, PM Kim, M Schultz, M Gerstein (2006). Bioinformatics 22: 2968-70.

A computational approach for identifying pseudogenes in the ENCODE regions.
D Zheng, MB Gerstein (2006). Genome Biol 7 Suppl 1: S131-10.

Assessing the limits of genomic data integration for predicting protein networks.
LJ Lu, Y Xia, A Paccanaro, H Yu, M Gerstein (2005). Genome Res 15: 945-53.

A Bayesian networks approach for predicting protein-protein interactions from genomic data.
R Jansen, H Yu, D Greenbaum, Y Kluger, NJ Krogan, S Chung, A Emili, M Snyder, JF Greenblatt, M Gerstein (2003). Science 302: 449-53.

Genomics. Defining genes in the genomics era
M Snyder, M Gerstein (2003). Science 300: 258-60.

MolMovDB: analysis and visualization of conformational change and structural flexibility.
N Echols, D Milburn, M Gerstein (2003). Nucleic Acids Res 31: 478-82.

Studying genomes through the aeons: protein families, pseudogenes and proteome evolution.
PM Harrison, M Gerstein (2002). J Mol Biol 318: 1155-74.

Proteomics. Integrating interactomes.
M Gerstein, N Lan, R Jansen (2002). Science 295: 284-7.

What is bioinformatics? A proposed definition and overview of the field.
NM Luscombe, D Greenbaum, M Gerstein (2001). Methods Inf Med 40: 346-58.

Interrelating different types of genomic data, from proteome to secretome: 'oming in on function.
D Greenbaum, NM Luscombe, R Jansen, J Qian, M Gerstein (2001). Genome Res 11: 1463-8.

Integrative data mining: the new direction in bioinformatics.
P Bertone, M Gerstein (2001). IEEE Eng Med Biol Mag 20: 33-40.

Integrative database analysis in structural genomics.
M Gerstein (2000). Nat Struct Biol 7 Suppl: 960-3.

The current excitement in bioinformatics-analysis of whole-genome expression data: how does it relate to protein structure and function?
M Gerstein, R Jansen (2000). Curr Opin Struct Biol 10: 574-84.

Perspectives: signal transduction. Proteins in motion.
M Gerstein, C Chothia (1999). Science 285: 1682-3.

E-publishing on the Web: promises, pitfalls, and payoffs for bioinformatics.
M Gerstein (1999). Bioinformatics 15: 429-31.

Advances in structural genomics.
SA Teichmann, C Chothia, M Gerstein (1999). Curr Opin Struct Biol 9: 390-9.


Return to front page