Mining

HiC-spector: a matrix library for spectral and reproducibility analysis of Hi-C contact maps.

KK Yan, GG Yardimci, C Yan, WS Noble, M Gerstein (2017). Bioinformatics 33: 2199-2201.

DREISS: Using State-Space Models to Infer the Dynamics of Gene Expression Driven by External and Internal Regulatory Networks.

D Wang, F He, S Maslov, M Gerstein (2016). PLoS Comput Biol 12: e1005146.

website

medline

High-order neural networks and kernel methods for peptide-MHC binding prediction.

PP Kuksa, MR Min, R Dugar, M Gerstein (2015). Bioinformatics 31: 3600-7.

medline

Loregic: a method to characterize the cooperative logic of regulatory factors.

D Wang, KK Yan, C Sisu, C Cheng, J Rozowsky, W Meyerson, MB Gerstein (2015). PLoS Comput Biol 11: e1004132.

website

medline

An approach for determining and measuring network hierarchy applied to comparing the phosphorylome and the regulome.

C Cheng, E Andrews, KK Yan, M Ung, D Wang, M Gerstein (2015). Genome Biol 16: 63.

website

medline

MUSIC: identification of enriched regions in ChIP-Seq experiments using a mappability-corrected multiscale signal processing framework.

A Harmanci, J Rozowsky, M Gerstein (2014). Genome Biol 15: 474.

website

medline

Comparative analysis of the transcriptome across distant species.

MB Gerstein, J Rozowsky, KK Yan, D Wang, C Cheng, JB Brown, CA Davis, L Hillier, C Sisu, JJ Li, B Pei, AO Harmanci, MO Duff, S Djebali, RP Alexander, BH Alver, R Auerbach, K Bell, PJ Bickel, ME Boeck, NP Boley, BW Booth, L Cherbas, P Cherbas, C Di, A Dobin, J Drenkow, B Ewing, G Fang, M Fastuca, EA Feingold, A Frankish, G Gao, PJ Good, R Guigo, A Hammonds, J Harrow, RA Hoskins, C Howald, L Hu, H Huang, TJ Hubbard, C Huynh, S Jha, D Kasper, M Kato, TC Kaufman, RR Kitchen, E Ladewig, J Lagarde, E Lai, J Leng, Z Lu, M MacCoss, G May, R McWhirter, G Merrihew, DM Miller, A Mortazavi, R Murad, B Oliver, S Olson, PJ Park, MJ Pazin, N Perrimon, D Pervouchine, V Reinke, A Reymond, G Robinson, A Samsonova, GI Saunders, F Schlesinger, A Sethi, FJ Slack, WC Spencer, MH Stoiber, P Strasbourger, A Tanzer, OA Thompson, KH Wan, G Wang, H Wang, KL Watkins, J Wen, K Wen, C Xue, L Yang, K Yip, C Zaleski, Y Zhang, H Zheng, SE Brenner, BR Graveley, SE Celniker, TR Gingeras, R Waterston (2014). Nature 512: 445-8.

website

medline

OrthoClust: an orthology-based network framework for clustering data across multiple species.

KK Yan, D Wang, J Rozowsky, H Zheng, C Cheng, M Gerstein (2014). Genome Biol 15: R100.

website

medline

Interpretable Sparse High-Order Boltzmann Machines for Transcription Factor Interaction Identification

MR Min, X Ning, C Cheng, M Gerstein (2013). NIPS Workshop on Machine Learning in Computational Biology.

preprint

Machine learning and genome annotation: a match meant to be?

KY Yip, C Cheng, M Gerstein (2013). Genome Biol 14: 205.

preprint

medline

Genomics: ENCODE leads the way on big data.

M Gerstein (2012). Nature 489: 208.

website

preprint

medline

An integrated encyclopedia of DNA elements in the human genome.

ENCODE Project Consortium (2012). Nature 489: 57-74.

website

medline

Architecture of the human regulatory network derived from ENCODE data.

MB Gerstein, A Kundaje, M Hariharan, SG Landt, KK Yan, C Cheng, XJ Mu, E Khurana, J Rozowsky, R Alexander, R Min, P Alves, A Abyzov, N Addleman, N Bhardwaj, AP Boyle, P Cayting, A Charos, DZ Chen, Y Cheng, D Clarke, C Eastman, G Euskirchen, S Frietze, Y Fu, J Gertz, F Grubert, A Harmanci, P Jain, M Kasowski, P Lacroute, JJ Leng, J Lian, H Monahan, H O'Geen, Z Ouyang, EC Partridge, D Patacsil, F Pauli, D Raha, L Ramirez, TE Reddy, B Reed, M Shi, T Slifer, J Wang, L Wu, X Yang, KY Yip, G Zilberman-Schapira, S Batzoglou, A Sidow, PJ Farnham, RM Myers, SM Weissman, M Snyder (2012). Nature 489: 91-100.

website

medline

Classification of human genomic regions based on experimentally determined binding sites of more than 100 transcription-related factors.

KY Yip, C Cheng, N Bhardwaj, JB Brown, J Leng, A Kundaje, J Rozowsky, E Birney, P Bickel, M Snyder, M Gerstein (2012). Genome Biol 13: R48.

website

medline

Understanding transcriptional regulation by integrative analysis of transcription factor binding data.

C Cheng, R Alexander, R Min, J Leng, KY Yip, J Rozowsky, KK Yan, X Dong, S Djebali, Y Ruan, CA Davis, P Carninci, T Lassman, TR Gingeras, R Guigo, E Birney, Z Weng, M Snyder, M Gerstein (2012). Genome Res 22: 1658-67.

website

medline

Genome-wide analysis of chromatin features identifies histone modification sensitive and insensitive yeast transcription factors.

C Cheng, C Shou, KY Yip, MB Gerstein (2011). Genome Biol 12: R111.

medline

Modeling the relative relationship of transcription factor binding and histone modifications to gene expression levels in mouse embryonic stem cells.

C Cheng, M Gerstein (2011). Nucleic Acids Res 40: 553-68.

medline

Integrated assessment of genomic correlates of protein evolutionary rate.

Y Xia, EA Franzosa, MB Gerstein (2009). PLoS Comput Biol 5: e1000413.

preprint

medline

Manually structured digital abstracts: a scaffold for automatic text mining

M Seringhaus, M Gerstein (2008). FEBS Lett 582: 1170.

preprint

medline

Data mining on the web

A Smith, M Gerstein (2006). Science 314: 1682; author reply 1682.

preprint

medline

An integrative genomic approach to uncover molecular mechanisms of prokaryotic traits.

Y Liu, J Li, L Sam, CS Goh, M Gerstein, YA Lussier (2006). PLoS Comput Biol 2: e159.

website

preprint

medline

Integration of curated databases to identify genotype-phenotype associations.

CS Goh, TA Gianoulis, Y Liu, J Li, A Paccanaro, YA Lussier, M Gerstein (2006). BMC Genomics 7: 257.

website

preprint

medline

Assessing the limits of genomic data integration for predicting protein networks.

LJ Lu, Y Xia, A Paccanaro, H Yu, M Gerstein (2005). Genome Res 15: 945-53.

website

preprint

medline

YeastHub: a semantic web use case for integrating data in the life sciences domain.

KH Cheung, KY Yip, A Smith, R Deknikker, A Masiar, M Gerstein (2005). Bioinformatics 21 Suppl 1: i85-96.

website

preprint

medline

Information assessment on predicting protein-protein interactions.

N Lin, B Wu, R Jansen, M Gerstein, H Zhao (2004). BMC Bioinformatics 5: 154.

preprint

medline

Analyzing protein function on a genomic scale: the importance of gold-standard positives and negatives for network prediction.

R Jansen, M Gerstein (2004). Curr Opin Microbiol 7: 535-45.

website

preprint

medline

Genomic analysis of regulatory network dynamics reveals large topological changes.

NM Luscombe, MM Babu, H Yu, M Snyder, SA Teichmann, M Gerstein (2004). Nature 431: 308-12.

website

preprint

medline

Mining the structural genomics pipeline: identification of protein properties that affect high-throughput experimental analysis.

CS Goh, N Lan, SM Douglas, B Wu, N Echols, A Smith, D Milburn, GT Montelione, H Zhao, M Gerstein (2004). J Mol Biol 336: 115-30.

website

preprint

medline

Data mining crystallization databases: knowledge-based approaches to optimize protein crystal screens.

MS Kimber, F Vallee, S Houston, A Necakov, T Skarina, E Evdokimova, S Beasley, D Christendat, A Savchenko, CH Arrowsmith, M Vedadi, M Gerstein, AM Edwards (2003). Proteins 51: 562-8.

preprint

medline

Integration of genomic datasets to predict protein complexes in yeast.

R Jansen, N Lan, J Qian, M Gerstein (2002). J Struct Funct Genomics 2: 71-81.

website

preprint

medline

YMD: a microarray database for large-scale gene expression analysis.

KH Cheung, K White, J Hager, M Gerstein, V Reinke, K Nelson, P Masiar, R Srivastava, Y Li, J Li, H Zhao, J Li, DB Allison, M Snyder, P Miller, K Williams (2002). Proc AMIA Symp : 140-4.

website

preprint

medline

Systematic learning of gene functional classes from DNA array expression data by using multilayer perceptrons.

A Mateos, J Dopazo, R Jansen, Y Tu, M Gerstein, G Stolovitzky (2002). Genome Res 12: 1703-15.

preprint

medline

GeneCensus: genome comparisons in terms of metabolic pathway activity and protein family sharing.

J Lin, J Qian, D Greenbaum, P Bertone, R Das, N Echols, A Senes, B Stenger, M Gerstein (2002). Nucleic Acids Res 30: 4574-82.

website

preprint

medline

Genomic and proteomic analysis of the myeloid differentiation program: global analysis of gene expression during induced differentiation in the MPRO cell line.

Z Lian, Y Kluger, DS Greenbaum, D Tuck, M Gerstein, N Berliner, SM Weissman, PE Newburger (2002). Blood 100: 3209-20.

website

preprint

medline

Genomic analysis of membrane protein families: abundance and conserved motifs.

Y Liu, DM Engelman, M Gerstein (2002). Genome Biol 3: research0054.

website

preprint

medline

Identification and analysis of over 2000 ribosomal protein pseudogenes in the human genome.

Z Zhang, P Harrison, M Gerstein (2002). Genome Res 12: 1466-82.

website

preprint

medline

Comprehensive analysis of amino acid and nucleotide composition in eukaryotic genomes, comparing genes and pseudogenes.

N Echols, P Harrison, S Balasubramanian, NM Luscombe, P Bertone, Z Zhang, M Gerstein (2002). Nucleic Acids Res 30: 2515-23.

website

preprint

medline

Integrative data mining: the new direction in bioinformatics.

P Bertone, M Gerstein (2001). IEEE Eng Med Biol Mag 20: 33-40.

preprint

medline

SPINE: an integrated tracking database and data mining approach for identifying feasible targets in high-throughput structural proteomics.

P Bertone, Y Kluger, N Lan, D Zheng, D Christendat, A Yee, AM Edwards, CH Arrowsmith, GT Montelione, M Gerstein (2001). Nucleic Acids Res 29: 2884-98.

website

preprint

medline

Return to front page