These constitute a good selection of papers to introduce one to the lab and the general field of bioinformatics.

One might start with Luscombe et al. (2002) for an introduction to bioinformatics.

Specific fields are highlighted in other short papers:

Data integration for function prediction
Cheng, et al. PLoS Comput. Biol. (2011); Cheng, et al., Genome Biol. (2011); Khurana, et al., PLoS Comput. Biol. (2013)
Structural Variation
Abyzov, et al., Genome Res. (2011)
Pseudogenes and loss of function
Pei, et al., Genome Biol. (2012); Khurana, et al., Nucleic Acids Res. (2010); Habegger, et al., Bioinfo. (2012)
Structure
Voss and Gerstein, Nucleic Acids Res. (2010); Bhardwaj, et al., Protein Sci. (2011)
Networks
Yan, et al., Proc Natl Acad Sci. (2010); Bhardwaj, et al., Sci Signal. (2010); Yu and Gerstein, Proc Natl Acad Sci. (2006)
Expression Analysis
Cheng, et al., Genome Res. (2012); Sboner, et al., Genome Res. (2011); Habegger, et al., Bioinfo. (2011)
Data Science Issues
Gerstein, Nature (2012); Greenbaum, et al., PLoS Comput Biol. (2011); Greenbaum and Gerstein, SF Op Ed (2012)
In each list, it is probably best to read the papers listed first.

Unfortunately, none of these papers give one much detail on the mathematical or computational aspects of the work. For this, it's best to look at intro-cs


Interpretation of genomic variants using a unified biological network approach.
E Khurana, Y Fu, J Chen, M Gerstein (2013). PLoS Comput Biol 9: e1002886.

Genomics: ENCODE leads the way on big data.
M Gerstein (2012). Nature 489: 208.

Understanding transcriptional regulation by integrative analysis of transcription factor binding data.
C Cheng, R Alexander, R Min, J Leng, KY Yip, J Rozowsky, KK Yan, X Dong, S Djebali, Y Ruan, CA Davis, P Carninci, T Lassman, TR Gingeras, R Guigo, E Birney, Z Weng, M Snyder, M Gerstein (2012). Genome Res 22: 1658-67.

The GENCODE pseudogene resource.
B Pei, C Sisu, A Frankish, C Howald, L Habegger, XJ Mu, R Harte, S Balasubramanian, A Tanzer, M Diekhans, A Reymond, TJ Hubbard, J Harrow, MB Gerstein (2012). Genome Biol 13: R51.

VAT: a computational framework to functionally annotate variants in personal genomes within a cloud-computing environment.
L Habegger, S Balasubramanian, DZ Chen, E Khurana, A Sboner, A Harmanci, J Rozowsky, D Clarke, M Snyder, M Gerstein (2012). Bioinformatics 28: 2267-9.

Genomics and Privacy: Implications of the New Reality of Closed Data for the Field
D Greenbaum, A Sboner, XJ Mu, M Gerstein (2011). PLoS Comput Biol 7: e1002278.

Construction and analysis of an integrated regulatory network derived from high-throughput sequencing data.
C Cheng, KK Yan, W Hwang, J Qian, N Bhardwaj, J Rozowsky, ZJ Lu, W Niu, P Alves, M Kato, M Snyder, M Gerstein (2011). PLoS Comput Biol 7: e1002190.

Integration of protein motions with molecular networks reveals different mechanisms for permanent and transient interactions.
N Bhardwaj, A Abyzov, D Clarke, C Shou, MB Gerstein (2011). Protein Sci 20: 1745-54.

CNVnator: an approach to discover, genotype, and characterize typical and atypical CNVs from family and population genome sequencing.
A Abyzov, AE Urban, M Snyder, M Gerstein (2011). Genome Res 21: 974-84.

A statistical framework for modeling gene expression using chromatin features and application to modENCODE datasets.
C Cheng, KK Yan, KY Yip, J Rozowsky, R Alexander, C Shou, M Gerstein (2011). Genome Biol 12: R15.

RSEQtools: a modular framework to analyze RNA-Seq data using compact, anonymized data summaries.
L Habegger, A Sboner, TA Gianoulis, J Rozowsky, A Agarwal, M Snyder, M Gerstein (2011). Bioinformatics 27: 281-3.

Rewiring of transcriptional regulatory networks: hierarchy, rather than connectivity, better reflects the importance of regulators.
N Bhardwaj, PM Kim, MB Gerstein (2010). Sci Signal 3: ra79.

FusionSeq: a modular framework for finding gene fusions by analyzing paired-end RNA-sequencing data.
A Sboner, L Habegger, D Pflueger, S Terry, DZ Chen, JS Rozowsky, AK Tewari, N Kitabayashi, BJ Moss, MS Chee, F Demichelis, MA Rubin, MB Gerstein (2010). Genome Biol 11: R104.

Segmental duplications in the human genome reveal details of pseudogene formation.
E Khurana, HY Lam, C Cheng, N Carriero, P Cayting, MB Gerstein (2010). Nucleic Acids Res 38: 6997-7007.

3V: cavity, channel and cleft volume calculator and extractor.
NR Voss, M Gerstein (2010). Nucleic Acids Res 38: W555-62.

Comparing genomes to computer operating systems in terms of the topology and evolution of their regulatory control networks
KK Yan, G Fang, N Bhardwaj, RP Alexander, M Gerstein (2010). Proc Natl Acad Sci U S A 107: 9186-91.

Genomic analysis of the hierarchical structure of regulatory networks.
H Yu, M Gerstein (2006). Proc Natl Acad Sci U S A 103: 14724-31.


Return to front page