Gerstein Lab Publications

Main  •  By Subject  •  Queries  •  Code  •  Other Writings

The Gerstein lab has made it a priority to develop its cutting edge algorithms and analyses into tools and resources in the form of downloadable programs, webservers, and databases. This is the heart of our work in transforming the big data of genomes into knowledge. For a full list, with links to download, see the lab Tools & Resources page lab resources & tools page. The source code of many of our tools in available on our lab Github page
MOAT: Efficient Detection of Highly Mutated Regions with the Mutations Overburdening Annotations Tool.
Lochovsky L, Zhang J, Gerstein M. Bioinformatics. 2017 Nov 7.
website
preprint
medline

HiC-spector: a matrix library for spectral and reproducibility analysis of Hi-C contact maps.
KK Yan, GG Yardimci, C Yan, WS Noble, M Gerstein (2017).Bioinformatics 33: 2199-2201.
website
preprint
medline

Intensification: A Resource for Amplifying Population-Genetic Signals with Protein Repeats.
J Chen, B Wang, L Regan, M Gerstein (2016).J Mol Biol 429: 435-445.
website
 
medline

A uniform survey of allele-specific binding and expression over 1000-Genomes-Project individuals.
J Chen, J Rozowsky, TR Galeev, A Harmanci, R Kitchen, J Bedford, A Abyzov, Y Kong, L Regan, M Gerstein (2016).Nat Commun 7: 11101.
website
preprint
medline

Identifying Allosteric Hotspots with Dynamics: Application to Inter- and Intra-species Conservation.
D Clarke, A Sethi, S Li, S Kumar, RWF Chang, J Chen, M Gerstein (2016).Structure 24: 826-837.
website
preprint
medline

LARVA: an integrative framework for large-scale analysis of recurrent variants in noncoding annotations.
L Lochovsky, J Zhang, Y Fu, E Khurana, M Gerstein (2015).Nucleic Acids Res 43: 8123-34.
website
 
medline

Loregic: a method to characterize the cooperative logic of regulatory factors.
D Wang, KK Yan, C Sisu, C Cheng, J Rozowsky, W Meyerson, MB Gerstein (2015).PLoS Comput Biol 11: e1004132.
website
 
medline

MUSIC: identification of enriched regions in ChIP-Seq experiments using a mappability-corrected multiscale signal processing framework.
A Harmanci, J Rozowsky, M Gerstein (2014).Genome Biol 15: 474.
website
 
medline

FunSeq2: a framework for prioritizing noncoding regulatory variants in cancer.
Y Fu, Z Liu, S Lou, J Bedford, XJ Mu, KY Yip, E Khurana, M Gerstein (2014).Genome Biol 15: 480.
website
 
medline

Comparative analysis of the transcriptome across distant species.
MB Gerstein, J Rozowsky, KK Yan, D Wang, C Cheng, JB Brown, CA Davis, L Hillier, C Sisu, JJ Li, B Pei, AO Harmanci, MO Duff, S Djebali, RP Alexander, BH Alver, R Auerbach, K Bell, PJ Bickel, ME Boeck, NP Boley, BW Booth, L Cherbas, P Cherbas, C Di, A Dobin, J Drenkow, B Ewing, G Fang, M Fastuca, EA Feingold, A Frankish, G Gao, PJ Good, R Guigo, A Hammonds, J Harrow, RA Hoskins, C Howald, L Hu, H Huang, TJ Hubbard, C Huynh, S Jha, D Kasper, M Kato, TC Kaufman, RR Kitchen, E Ladewig, J Lagarde, E Lai, J Leng, Z Lu, M MacCoss, G May, R McWhirter, G Merrihew, DM Miller, A Mortazavi, R Murad, B Oliver, S Olson, PJ Park, MJ Pazin, N Perrimon, D Pervouchine, V Reinke, A Reymond, G Robinson, A Samsonova, GI Saunders, F Schlesinger, A Sethi, FJ Slack, WC Spencer, MH Stoiber, P Strasbourger, A Tanzer, OA Thompson, KH Wan, G Wang, H Wang, KL Watkins, J Wen, K Wen, C Xue, L Yang, K Yip, C Zaleski, Y Zhang, H Zheng, SE Brenner, BR Graveley, SE Celniker, TR Gingeras, R Waterston (2014).Nature 512: 445-8.
website
 
medline

OrthoClust: an orthology-based network framework for clustering data across multiple species.
KK Yan, D Wang, J Rozowsky, H Zheng, C Cheng, M Gerstein (2014).Genome Biol 15: R100.
website
 
medline

Integrative annotation of variants from 1092 humans: application to cancer genomics.
E Khurana, Y Fu, V Colonna, XJ Mu, HM Kang, T Lappalainen, A Sboner, L Lochovsky, J Chen, A Harmanci, J Das, A Abyzov, S Balasubramanian, K Beal, D Chakravarty, D Challis, Y Chen, D Clarke, L Clarke, F Cunningham, US Evani, P Flicek, R Fragoza, E Garrison, R Gibbs, ZH Gumus, J Herrero, N Kitabayashi, Y Kong, K Lage, V Liluashvili, SM Lipkin, DG MacArthur, G Marth, D Muzny, TH Pers, GR Ritchie, JA Rosenfeld, C Sisu, X Wei, M Wilson, Y Xue, F Yu, 1000 Genomes Project Consortium, ET Dermitzakis, H Yu, MA Rubin, C Tyler-Smith, M Gerstein (2013). Science 342: 1235587
website
preprint
medline

Architecture of the human regulatory network derived from ENCODE data.
MB Gerstein, A Kundaje, M Hariharan, SG Landt, KK Yan, C Cheng, XJ Mu, E Khurana, J Rozowsky, R Alexander, R Min, P Alves, A Abyzov, N Addleman, N Bhardwaj, AP Boyle, P Cayting, A Charos, DZ Chen, Y Cheng, D Clarke, C Eastman, G Euskirchen, S Frietze, Y Fu, J Gertz, F Grubert, A Harmanci, P Jain, M Kasowski, P Lacroute, JJ Leng, J Lian, H Monahan, H O'Geen, Z Ouyang, EC Partridge, D Patacsil, F Pauli, D Raha, L Ramirez, TE Reddy, B Reed, M Shi, T Slifer, J Wang, L Wu, X Yang, KY Yip, G Zilberman-Schapira, S Batzoglou, A Sidow, PJ Farnham, RM Myers, SM Weissman, M Snyder (2012).Nature 489: 91-100.
website
 
medline

Classification of human genomic regions based on experimentally determined binding sites of more than 100 transcription-related factors.
KY Yip, C Cheng, N Bhardwaj, JB Brown, J Leng, A Kundaje, J Rozowsky, E Birney, P Bickel, M Snyder, M Gerstein (2012).Genome Biol 13: R48.
website
 
medline

Understanding transcriptional regulation by integrative analysis of transcription factor binding data.
C Cheng, R Alexander, R Min, J Leng, KY Yip, J Rozowsky, KK Yan, X Dong, S Djebali, Y Ruan, CA Davis, P Carninci, T Lassman, TR Gingeras, R Guigo, E Birney, Z Weng, M Snyder, M Gerstein (2012).Genome Res 22: 1658-67.
website
 
medline

Detecting and annotating genetic variations using the HugeSeq pipeline.
HY Lam, C Pan, MJ Clark, P Lacroute, R Chen, R Haraksingh, M O'Huallachain, MB Gerstein, JM Kidd, CD Bustamante, M Snyder (2012).Nat Biotechnol 30: 226-9.
website
 
medline

VAT: a computational framework to functionally annotate variants in personal genomes within a cloud-computing environment.
L Habegger, S Balasubramanian, DZ Chen, E Khurana, A Sboner, A Harmanci, J Rozowsky, D Clarke, M Snyder, M Gerstein (2012).Bioinformatics 28: 2267-9.
website
 
medline

IQSeq: integrated isoform quantification analysis based on next-generation sequencing.
J Du, J Leng, L Habegger, A Sboner, D McDermott, M Gerstein (2012).PLoS One 7: e29175.
website
 
medline

Integration of protein motions with molecular networks reveals different mechanisms for permanent and transient interactions.
N Bhardwaj, A Abyzov, D Clarke, C Shou, MB Gerstein (2011).Protein Sci 20: 1745-54.
website
 
medline

AlleleSeq: analysis of allele-specific expression and binding in a network framework.
J Rozowsky, A Abyzov, J Wang, P Alves, D Raha, A Harmanci, J Leng, R Bjornson, Y Kong, N Kitabayashi, N Bhardwaj, M Rubin, M Snyder, M Gerstein (2011).Mol Syst Biol 7: 522.
website
 
medline

Identification of genomic indels and structural variations using split reads.
ZD Zhang, J Du, H Lam, A Abyzov, AE Urban, M Snyder, M Gerstein (2011).BMC Genomics 12: 375.
 
 
medline

ACT: aggregation and correlation toolbox for analyses of genome tracks.
J Jee, J Rozowsky, KY Yip, L Lochovsky, R Bjornson, G Zhong, Z Zhang, Y Fu, J Wang, Z Weng, M Gerstein (2011).Bioinformatics 27: 1152-4.
website
 
medline

CNVnator: an approach to discover, genotype, and characterize typical and atypical CNVs from family and population genome sequencing.
A Abyzov, AE Urban, M Snyder, M Gerstein (2011).Genome Res 21: 974-84.
website
 
medline

AGE: defining breakpoints of genomic structural variants at single-nucleotide resolution, through optimal alignments with gap excision.
A Abyzov, M Gerstein (2011).Bioinformatics 27: 595-603.
website
 
medline

Prediction and characterization of noncoding RNAs in C. elegans by integrating conservation, secondary structure, and high-throughput sequencing and array data.
ZJ Lu, KY Yip, G Wang, C Shou, LW Hillier, E Khurana, A Agarwal, R Auerbach, J Rozowsky, C Cheng, M Kato, DM Miller, F Slack, M Snyder, RH Waterston, V Reinke, MB Gerstein (2011).Genome Res 21: 276-85.
website
preprint
medline

RSEQtools: a modular framework to analyze RNA-Seq data using compact, anonymized data summaries.
L Habegger, A Sboner, TA Gianoulis, J Rozowsky, A Agarwal, M Snyder, M Gerstein (2011).Bioinformatics 27: 281-3.
website
 
medline

FusionSeq: a modular framework for finding gene fusions by analyzing paired-end RNA-sequencing data.
A Sboner, L Habegger, D Pflueger, S Terry, DZ Chen, JS Rozowsky, AK Tewari, N Kitabayashi, BJ Moss, MS Chee, F Demichelis, MA Rubin, MB Gerstein (2010).Genome Biol 11: R104.
website
 
medline

3V: cavity, channel and cleft volume calculator and extractor.
NR Voss, M Gerstein (2010).Nucleic Acids Res 38: W555-62.
website
preprint
medline

MOTIPS: automated motif analysis for predicting targets of modular protein domains.
HY Lam, PM Kim, J Mok, R Tonikian, SS Sidhu, BE Turk, M Snyder, MB Gerstein (2010).BMC Bioinformatics 11: 243.
website
preprint
medline

Nucleotide-resolution analysis of structural variants using BreakSeq and a breakpoint library.
HY Lam, XJ Mu, AM Stutz, A Tanzer, PD Cayting, M Snyder, PM Kim, JO Korbel, MB Gerstein (2010).Nat Biotechnol 28: 47-55.
website
preprint
medline

RigidFinder: a fast and sensitive method to detect rigid blocks in large macromolecular complexes.
A Abyzov, R Bjornson, M Felipe, M Gerstein (2010).Proteins 78: 309-24.
website
preprint
medline

Integrating sequencing technologies in personal genomics: optimal low cost reconstruction of structural variants.
J Du, RD Bjornson, ZD Zhang, Y Kong, M Snyder, MB Gerstein (2009).PLoS Comput Biol 5: e1000432.
website
preprint
medline

Relating protein conformational changes to packing efficiency and disorder.
N Bhardwaj, M Gerstein (2009).Protein Sci 18: 1230-40.
website
preprint
medline

PEMer: a computational framework with simulation-based error models for inferring genomic structural variants from massive paired-end sequencing data.
JO Korbel, A Abyzov, XJ Mu, N Carriero, P Cayting, Z Zhang, M Snyder, MB Gerstein (2009).Genome Biol 10: R23.
website
preprint
medline

PeakSeq enables systematic scoring of ChIP-seq experiments relative to controls.
J Rozowsky, G Euskirchen, RK Auerbach, ZD Zhang, T Gibson, R Bjornson, N Carriero, M Snyder, MB Gerstein (2009).Nat Biotechnol 27: 66-75.
website
preprint
medline

Pseudofam: the pseudogene families database.
HY Lam, E Khurana, G Fang, P Cayting, N Carriero, KH Cheung, MB Gerstein (2009).Nucleic Acids Res 37: D738-43.
website
preprint
medline

An integrated system for studying residue coevolution in proteins.
KY Yip, P Patel, PM Kim, DM Engelman, D McDermott, M Gerstein (2008).Bioinformatics 24: 290-2.
website
preprint
medline

Leveraging the structure of the Semantic Web to enhance information retrieval for proteomics.
A Smith, K Cheung, M Krauthammer, M Schultz, M Gerstein (2007).Bioinformatics 23: 3073-9.
website
preprint
medline

PARE: a tool for comparing protein abundance and mRNA expression data.
EZ Yu, AE Burba, M Gerstein (2007).BMC Bioinformatics 8: 309.
website
preprint
medline

Tilescope: online analysis pipeline for high-density tiling microarray data.
ZD Zhang, J Rozowsky, HY Lam, J Du, M Snyder, M Gerstein (2007).Genome Biol 8: R81.
website
preprint
medline

Pseudogene.org: a comprehensive database and comparison platform for pseudogene annotation.
JE Karro, Y Yan, D Zheng, Z Zhang, N Carriero, P Cayting, P Harrrison, M Gerstein (2007).Nucleic Acids Res 35: D55-60.
website
preprint
medline

ProCAT: a data analysis approach for protein microarrays.
X Zhu, M Gerstein, M Snyder (2006).Genome Biol 7: R110.
website
preprint
medline

BoCaTFBS: a boosted cascade learner to refine the binding sites suggested by ChIP-chip experiments.
LY Wang, M Snyder, M Gerstein (2006).Genome Biol 7: R102.
website
preprint
medline

Helix Interaction Tool (HIT): a web-based tool for analysis of helix-helix interactions in proteins.
AE Burba, U Lehnert, EZ Yu, M Gerstein (2006).Bioinformatics 22: 2735-8.
website
preprint
medline

The tYNA platform for comparative interactomics: a web tool for managing, comparing and mining multiple networks.
KY Yip, H Yu, PM Kim, M Schultz, M Gerstein (2006).Bioinformatics 22: 2968-70.
website
preprint
medline

PseudoPipe: an automated pseudogene identification pipeline.
Z Zhang, N Carriero, D Zheng, J Karro, PM Harrison, M Gerstein (2006).Bioinformatics 22: 1437-9.
website
preprint
medline

The Database of Macromolecular Motions: new features added at the decade mark.
S Flores, N Echols, D Milburn, B Hespenheide, K Keating, J Lu, S Wells, EZ Yu, M Thorpe, M Gerstein (2006).Nucleic Acids Res 34: D296-301.
website
preprint
medline

PubNet: a flexible system for visualizing literature derived networks.
SM Douglas, GT Montelione, M Gerstein (2005).Genome Biol 6: R80.
website
preprint
medline

YeastHub: a semantic web use case for integrating data in the life sciences domain.
KH Cheung, KY Yip, A Smith, R Deknikker, A Masiar, M Gerstein (2005).Bioinformatics 21 Suppl 1: i85-96.
website
preprint
medline

Calculation of standard atomic volumes for RNA and comparison with proteins: RNA is packed more tightly.
NR Voss, M Gerstein (2005).J Mol Biol 346: 477-92.
website
preprint
medline

TopNet: a tool for comparing biological sub-networks, correlating protein properties with topological statistics.
H Yu, X Zhu, D Greenbaum, J Karro, M Gerstein (2004).Nucleic Acids Res 32: 328-37.
website
preprint
medline

ExpressYourself: A modular platform for processing and visualizing microarray data.
NM Luscombe, TE Royce, P Bertone, N Echols, CE Horak, JT Chang, M Snyder, M Gerstein (2003).Nucleic Acids Res 31: 3477-82.
website
preprint
medline

SPINE 2: a system for collaborative structural proteomics within a federated database framework.
CS Goh, N Lan, N Echols, SM Douglas, D Milburn, P Bertone, R Xiao, LC Ma, D Zheng, Z Wunderlich, T Acton, GT Montelione, M Gerstein (2003).Nucleic Acids Res 31: 2833-8.
website
preprint
medline

MolMovDB: analysis and visualization of conformational change and structural flexibility.
N Echols, D Milburn, M Gerstein (2003).Nucleic Acids Res 31: 478-82.
website
preprint
medline

Calculations of protein volumes: sensitivity analysis and parameter database.
J Tsai, M Gerstein (2002).Bioinformatics 18: 985-95.
website
preprint
medline

GeneCensus: genome comparisons in terms of metabolic pathway activity and protein family sharing.
J Lin, J Qian, D Greenbaum, P Bertone, R Das, N Echols, A Senes, B Stenger, M Gerstein (2002).Nucleic Acids Res 30: 4574-82.
website
preprint
medline

Beyond synexpression relationships: local clustering of time-shifted and inverted gene expression profiles identifies new, biologically relevant interactions.
J Qian, M Dolled-Filhart, J Lin, H Yu, M Gerstein (2001).J Mol Biol 314: 1053-66.
website
preprint
medline

Determining the minimum number of types necessary to represent the sizes of protein atoms.
J Tsai, N Voss, M Gerstein (2001).Bioinformatics 17: 949-56.
website
preprint
medline

Protein Geometry: Distances, Areas, and Volumes
M Gerstein, F M Richards (2001). International Tables for Crystallography (Volume F, Chapter 22.1.1, pages 531-539; M Rossmann & E Arnold, editors; Dordrecht: Kluwer)
website
preprint
 

SPINE: an integrated tracking database and data mining approach for identifying feasible targets in high-throughput structural proteomics.
P Bertone, Y Kluger, N Lan, D Zheng, D Christendat, A Yee, AM Edwards, CH Arrowsmith, GT Montelione, M Gerstein (2001).Nucleic Acids Res 29: 2884-98.
website
preprint
medline

PartsList: a web-based system for dynamically ranking protein folds based on disparate attributes, including whole-genome expression and interaction information.
J Qian, B Stenger, CA Wilson, J Lin, R Jansen, SA Teichmann, J Park, WG Krebs, H Yu, V Alexandrov, N Echols, M Gerstein (2001).Nucleic Acids Res 29: 1750-64.
website
preprint
medline

Whole-genome trees based on the occurrence of folds and orthologs: implications for comparing genomes on different levels.
J Lin, M Gerstein (2000).Genome Res 10: 808-18.
website
preprint
medline

The morph server: a standardized system for analyzing and visualizing macromolecular motions in a database framework.
WG Krebs, M Gerstein (2000).Nucleic Acids Res 28: 1665-75.
website
preprint
medline

Patterns of protein-fold usage in eight microbial genomes: a comprehensive structural census.
M Gerstein (1998).Proteins 33: 518-34.
website
preprint
medline

A database of macromolecular motions.
M Gerstein, W Krebs (1998).Nucleic Acids Res 26: 4280-90.
website
preprint
medline

LPFC: an Internet library of protein family core structures.
R Schmidt, M Gerstein, RB Altman (1997).Protein Sci 6: 246-8.
website
preprint
medline


Return to front page