Resources

The Gerstein lab has made it a priority to develop its cutting edge algorithms and analyses into tools and resources in the form of downloadable programs, webservers, and databases. This is the heart of our work in transforming the big data of genomes into knowledge. For a full list, with links to download, see the lab Tools & Resources page lab resources & tools page. The source code of many of our tools in available on our lab Github page

Network propagation-based prioritization of long tail genes in 17 cancer types.

H Mohsen, V Gunasekharan, T Qing, M Seay, Y Surovtseva, S Negahban, Z Szallasi, L Pusztai, MB Gerstein (2021). Genome Biol 22: 287.

A comprehensive catalog of predicted functional upstream open reading frames in humans.

P McGillivray, R Ault, M Pawashe, R Kitchen, S Balasubramanian, M Gerstein (2018). Nucleic Acids Res 46: 3326-3338.

MOAT: efficient detection of highly mutated regions with the Mutations Overburdening Annotations Tool.

L Lochovsky, J Zhang, M Gerstein (2017). Bioinformatics 34: 1031-1033.

HiC-spector: a matrix library for spectral and reproducibility analysis of Hi-C contact maps.

KK Yan, GG Yardimci, C Yan, WS Noble, M Gerstein (2017). Bioinformatics 33: 2199-2201.

Intensification: A Resource for Amplifying Population-Genetic Signals with Protein Repeats.

J Chen, B Wang, L Regan, M Gerstein (2016). J Mol Biol 429: 435-445.

A uniform survey of allele-specific binding and expression over 1000-Genomes-Project individuals.

J Chen, J Rozowsky, TR Galeev, A Harmanci, R Kitchen, J Bedford, A Abyzov, Y Kong, L Regan, M Gerstein (2016). Nat Commun 7: 11101.

Identifying Allosteric Hotspots with Dynamics: Application to Inter- and Intra-species Conservation.

D Clarke, A Sethi, S Li, S Kumar, RWF Chang, J Chen, M Gerstein (2016). Structure 24: 826-837.

LARVA: an integrative framework for large-scale analysis of recurrent variants in noncoding annotations.

L Lochovsky, J Zhang, Y Fu, E Khurana, M Gerstein (2015). Nucleic Acids Res 43: 8123-34.

Loregic: a method to characterize the cooperative logic of regulatory factors.

D Wang, KK Yan, C Sisu, C Cheng, J Rozowsky, W Meyerson, MB Gerstein (2015). PLoS Comput Biol 11: e1004132.

MUSIC: identification of enriched regions in ChIP-Seq experiments using a mappability-corrected multiscale signal processing framework.

A Harmanci, J Rozowsky, M Gerstein (2014). Genome Biol 15: 474.

FunSeq2: a framework for prioritizing noncoding regulatory variants in cancer.

Y Fu, Z Liu, S Lou, J Bedford, XJ Mu, KY Yip, E Khurana, M Gerstein (2014). Genome Biol 15: 480.

Comparative analysis of the transcriptome across distant species.

MB Gerstein, J Rozowsky, KK Yan, D Wang, C Cheng, JB Brown, CA Davis, L Hillier, C Sisu, JJ Li, B Pei, AO Harmanci, MO Duff, S Djebali, RP Alexander, BH Alver, R Auerbach, K Bell, PJ Bickel, ME Boeck, NP Boley, BW Booth, L Cherbas, P Cherbas, C Di, A Dobin, J Drenkow, B Ewing, G Fang, M Fastuca, EA Feingold, A Frankish, G Gao, PJ Good, R Guigo, A Hammonds, J Harrow, RA Hoskins, C Howald, L Hu, H Huang, TJ Hubbard, C Huynh, S Jha, D Kasper, M Kato, TC Kaufman, RR Kitchen, E Ladewig, J Lagarde, E Lai, J Leng, Z Lu, M MacCoss, G May, R McWhirter, G Merrihew, DM Miller, A Mortazavi, R Murad, B Oliver, S Olson, PJ Park, MJ Pazin, N Perrimon, D Pervouchine, V Reinke, A Reymond, G Robinson, A Samsonova, GI Saunders, F Schlesinger, A Sethi, FJ Slack, WC Spencer, MH Stoiber, P Strasbourger, A Tanzer, OA Thompson, KH Wan, G Wang, H Wang, KL Watkins, J Wen, K Wen, C Xue, L Yang, K Yip, C Zaleski, Y Zhang, H Zheng, SE Brenner, BR Graveley, SE Celniker, TR Gingeras, R Waterston (2014). Nature 512: 445-8.

OrthoClust: an orthology-based network framework for clustering data across multiple species.

KK Yan, D Wang, J Rozowsky, H Zheng, C Cheng, M Gerstein (2014). Genome Biol 15: R100.

Integrative annotation of variants from 1092 humans: application to cancer genomics.

E Khurana, Y Fu, V Colonna, XJ Mu, HM Kang, T Lappalainen, A Sboner, L Lochovsky, J Chen, A Harmanci, J Das, A Abyzov, S Balasubramanian, K Beal, D Chakravarty, D Challis, Y Chen, D Clarke, L Clarke, F Cunningham, US Evani, P Flicek, R Fragoza, E Garrison, R Gibbs, ZH Gumus, J Herrero, N Kitabayashi, Y Kong, K Lage, V Liluashvili, SM Lipkin, DG MacArthur, G Marth, D Muzny, TH Pers, GRS Ritchie, JA Rosenfeld, C Sisu, X Wei, M Wilson, Y Xue, F Yu, 1000 Genomes Project Consortium, ET Dermitzakis, H Yu, MA Rubin, C Tyler-Smith, M Gerstein (2013). Science 342: 1235587.

Architecture of the human regulatory network derived from ENCODE data.

MB Gerstein, A Kundaje, M Hariharan, SG Landt, KK Yan, C Cheng, XJ Mu, E Khurana, J Rozowsky, R Alexander, R Min, P Alves, A Abyzov, N Addleman, N Bhardwaj, AP Boyle, P Cayting, A Charos, DZ Chen, Y Cheng, D Clarke, C Eastman, G Euskirchen, S Frietze, Y Fu, J Gertz, F Grubert, A Harmanci, P Jain, M Kasowski, P Lacroute, JJ Leng, J Lian, H Monahan, H O'Geen, Z Ouyang, EC Partridge, D Patacsil, F Pauli, D Raha, L Ramirez, TE Reddy, B Reed, M Shi, T Slifer, J Wang, L Wu, X Yang, KY Yip, G Zilberman-Schapira, S Batzoglou, A Sidow, PJ Farnham, RM Myers, SM Weissman, M Snyder (2012). Nature 489: 91-100.

Classification of human genomic regions based on experimentally determined binding sites of more than 100 transcription-related factors.

KY Yip, C Cheng, N Bhardwaj, JB Brown, J Leng, A Kundaje, J Rozowsky, E Birney, P Bickel, M Snyder, M Gerstein (2012). Genome Biol 13: R48.

Understanding transcriptional regulation by integrative analysis of transcription factor binding data.

C Cheng, R Alexander, R Min, J Leng, KY Yip, J Rozowsky, KK Yan, X Dong, S Djebali, Y Ruan, CA Davis, P Carninci, T Lassman, TR Gingeras, R Guigo, E Birney, Z Weng, M Snyder, M Gerstein (2012). Genome Res 22: 1658-67.

Detecting and annotating genetic variations using the HugeSeq pipeline.

HY Lam, C Pan, MJ Clark, P Lacroute, R Chen, R Haraksingh, M O'Huallachain, MB Gerstein, JM Kidd, CD Bustamante, M Snyder (2012). Nat Biotechnol 30: 226-9.

VAT: a computational framework to functionally annotate variants in personal genomes within a cloud-computing environment.

L Habegger, S Balasubramanian, DZ Chen, E Khurana, A Sboner, A Harmanci, J Rozowsky, D Clarke, M Snyder, M Gerstein (2012). Bioinformatics 28: 2267-9.

IQSeq: integrated isoform quantification analysis based on next-generation sequencing.

J Du, J Leng, L Habegger, A Sboner, D McDermott, M Gerstein (2012). PLoS One 7: e29175.

Integration of protein motions with molecular networks reveals different mechanisms for permanent and transient interactions.

N Bhardwaj, A Abyzov, D Clarke, C Shou, MB Gerstein (2011). Protein Sci 20: 1745-54.

AlleleSeq: analysis of allele-specific expression and binding in a network framework.

J Rozowsky, A Abyzov, J Wang, P Alves, D Raha, A Harmanci, J Leng, R Bjornson, Y Kong, N Kitabayashi, N Bhardwaj, M Rubin, M Snyder, M Gerstein (2011). Mol Syst Biol 7: 522.

Identification of genomic indels and structural variations using split reads.

ZD Zhang, J Du, H Lam, A Abyzov, AE Urban, M Snyder, M Gerstein (2011). BMC Genomics 12: 375.

ACT: aggregation and correlation toolbox for analyses of genome tracks.

J Jee, J Rozowsky, KY Yip, L Lochovsky, R Bjornson, G Zhong, Z Zhang, Y Fu, J Wang, Z Weng, M Gerstein (2011). Bioinformatics 27: 1152-4.

CNVnator: an approach to discover, genotype, and characterize typical and atypical CNVs from family and population genome sequencing.

A Abyzov, AE Urban, M Snyder, M Gerstein (2011). Genome Res 21: 974-84.

AGE: defining breakpoints of genomic structural variants at single-nucleotide resolution, through optimal alignments with gap excision.

A Abyzov, M Gerstein (2011). Bioinformatics 27: 595-603.

Prediction and characterization of noncoding RNAs in C. elegans by integrating conservation, secondary structure, and high-throughput sequencing and array data.

ZJ Lu, KY Yip, G Wang, C Shou, LW Hillier, E Khurana, A Agarwal, R Auerbach, J Rozowsky, C Cheng, M Kato, DM Miller, F Slack, M Snyder, RH Waterston, V Reinke, MB Gerstein (2011). Genome Res 21: 276-85.

RSEQtools: a modular framework to analyze RNA-Seq data using compact, anonymized data summaries.

L Habegger, A Sboner, TA Gianoulis, J Rozowsky, A Agarwal, M Snyder, M Gerstein (2011). Bioinformatics 27: 281-3.

FusionSeq: a modular framework for finding gene fusions by analyzing paired-end RNA-sequencing data.

A Sboner, L Habegger, D Pflueger, S Terry, DZ Chen, JS Rozowsky, AK Tewari, N Kitabayashi, BJ Moss, MS Chee, F Demichelis, MA Rubin, MB Gerstein (2010). Genome Biol 11: R104.

3V: cavity, channel and cleft volume calculator and extractor.

NR Voss, M Gerstein (2010). Nucleic Acids Res 38: W555-62.

MOTIPS: automated motif analysis for predicting targets of modular protein domains.

HY Lam, PM Kim, J Mok, R Tonikian, SS Sidhu, BE Turk, M Snyder, MB Gerstein (2010). BMC Bioinformatics 11: 243.

Nucleotide-resolution analysis of structural variants using BreakSeq and a breakpoint library.

HY Lam, XJ Mu, AM Stutz, A Tanzer, PD Cayting, M Snyder, PM Kim, JO Korbel, MB Gerstein (2010). Nat Biotechnol 28: 47-55.

RigidFinder: a fast and sensitive method to detect rigid blocks in large macromolecular complexes.

A Abyzov, R Bjornson, M Felipe, M Gerstein (2010). Proteins 78: 309-24.

Integrating sequencing technologies in personal genomics: optimal low cost reconstruction of structural variants.

J Du, RD Bjornson, ZD Zhang, Y Kong, M Snyder, MB Gerstein (2009). PLoS Comput Biol 5: e1000432.

Relating protein conformational changes to packing efficiency and disorder.

N Bhardwaj, M Gerstein (2009). Protein Sci 18: 1230-40.

PEMer: a computational framework with simulation-based error models for inferring genomic structural variants from massive paired-end sequencing data.

JO Korbel, A Abyzov, XJ Mu, N Carriero, P Cayting, Z Zhang, M Snyder, MB Gerstein (2009). Genome Biol 10: R23.

PeakSeq enables systematic scoring of ChIP-seq experiments relative to controls.

J Rozowsky, G Euskirchen, RK Auerbach, ZD Zhang, T Gibson, R Bjornson, N Carriero, M Snyder, MB Gerstein (2009). Nat Biotechnol 27: 66-75.

Pseudofam: the pseudogene families database.

HY Lam, E Khurana, G Fang, P Cayting, N Carriero, KH Cheung, MB Gerstein (2009). Nucleic Acids Res 37: D738-43.

An integrated system for studying residue coevolution in proteins.

KY Yip, P Patel, PM Kim, DM Engelman, D McDermott, M Gerstein (2008). Bioinformatics 24: 290-2.

Leveraging the structure of the Semantic Web to enhance information retrieval for proteomics

A Smith, K Cheung, M Krauthammer, M Schultz, M Gerstein (2007). Bioinformatics 23: 3073-9.

PARE: a tool for comparing protein abundance and mRNA expression data.

EZ Yu, AE Burba, M Gerstein (2007). BMC Bioinformatics 8: 309.

Tilescope: online analysis pipeline for high-density tiling microarray data.

ZD Zhang, J Rozowsky, HY Lam, J Du, M Snyder, M Gerstein (2007). Genome Biol 8: R81.

Pseudogene.org: a comprehensive database and comparison platform for pseudogene annotation.

JE Karro, Y Yan, D Zheng, Z Zhang, N Carriero, P Cayting, P Harrrison, M Gerstein (2007). Nucleic Acids Res 35: D55-60.

ProCAT: a data analysis approach for protein microarrays.

X Zhu, M Gerstein, M Snyder (2006). Genome Biol 7: R110.

BoCaTFBS: a boosted cascade learner to refine the binding sites suggested by ChIP-chip experiments.

LY Wang, M Snyder, M Gerstein (2006). Genome Biol 7: R102.

Helix Interaction Tool (HIT): a web-based tool for analysis of helix-helix interactions in proteins.

AE Burba, U Lehnert, EZ Yu, M Gerstein (2006). Bioinformatics 22: 2735-8.

The tYNA platform for comparative interactomics: a web tool for managing, comparing and mining multiple networks.

KY Yip, H Yu, PM Kim, M Schultz, M Gerstein (2006). Bioinformatics 22: 2968-70.

PseudoPipe: an automated pseudogene identification pipeline.

Z Zhang, N Carriero, D Zheng, J Karro, PM Harrison, M Gerstein (2006). Bioinformatics 22: 1437-9.

The Database of Macromolecular Motions: new features added at the decade mark.

S Flores, N Echols, D Milburn, B Hespenheide, K Keating, J Lu, S Wells, EZ Yu, M Thorpe, M Gerstein (2006). Nucleic Acids Res 34: D296-301.

PubNet: a flexible system for visualizing literature derived networks.

SM Douglas, GT Montelione, M Gerstein (2005). Genome Biol 6: R80.

YeastHub: a semantic web use case for integrating data in the life sciences domain.

KH Cheung, KY Yip, A Smith, R Deknikker, A Masiar, M Gerstein (2005). Bioinformatics 21 Suppl 1: i85-96.

Calculation of standard atomic volumes for RNA and comparison with proteins: RNA is packed more tightly.

NR Voss, M Gerstein (2005). J Mol Biol 346: 477-92.

TopNet: a tool for comparing biological sub-networks, correlating protein properties with topological statistics.

H Yu, X Zhu, D Greenbaum, J Karro, M Gerstein (2004). Nucleic Acids Res 32: 328-37.

ExpressYourself: A modular platform for processing and visualizing microarray data.

NM Luscombe, TE Royce, P Bertone, N Echols, CE Horak, JT Chang, M Snyder, M Gerstein (2003). Nucleic Acids Res 31: 3477-82.

SPINE 2: a system for collaborative structural proteomics within a federated database framework.

CS Goh, N Lan, N Echols, SM Douglas, D Milburn, P Bertone, R Xiao, LC Ma, D Zheng, Z Wunderlich, T Acton, GT Montelione, M Gerstein (2003). Nucleic Acids Res 31: 2833-8.

MolMovDB: analysis and visualization of conformational change and structural flexibility.

N Echols, D Milburn, M Gerstein (2003). Nucleic Acids Res 31: 478-82.

Calculations of protein volumes: sensitivity analysis and parameter database.

J Tsai, M Gerstein (2002). Bioinformatics 18: 985-95.

GeneCensus: genome comparisons in terms of metabolic pathway activity and protein family sharing.

J Lin, J Qian, D Greenbaum, P Bertone, R Das, N Echols, A Senes, B Stenger, M Gerstein (2002). Nucleic Acids Res 30: 4574-82.

Beyond synexpression relationships: local clustering of time-shifted and inverted gene expression profiles identifies new, biologically relevant interactions.

J Qian, M Dolled-Filhart, J Lin, H Yu, M Gerstein (2001). J Mol Biol 314: 1053-66.

Determining the minimum number of types necessary to represent the sizes of protein atoms.

J Tsai, N Voss, M Gerstein (2001). Bioinformatics 17: 949-56.

Protein Geometry: Distances, Areas, and Volumes

M Gerstein, F M Richards (2001). International Tables for Crystallography (Volume F, Chapter 22.1.1, pages 531-539; M Rossmann & E Arnold, editors; Dordrecht: Kluwer)

SPINE: an integrated tracking database and data mining approach for identifying feasible targets in high-throughput structural proteomics.

P Bertone, Y Kluger, N Lan, D Zheng, D Christendat, A Yee, AM Edwards, CH Arrowsmith, GT Montelione, M Gerstein (2001). Nucleic Acids Res 29: 2884-98.

PartsList: a web-based system for dynamically ranking protein folds based on disparate attributes, including whole-genome expression and interaction information.

J Qian, B Stenger, CA Wilson, J Lin, R Jansen, SA Teichmann, J Park, WG Krebs, H Yu, V Alexandrov, N Echols, M Gerstein (2001). Nucleic Acids Res 29: 1750-64.

Whole-genome trees based on the occurrence of folds and orthologs: implications for comparing genomes on different levels.

J Lin, M Gerstein (2000). Genome Res 10: 808-18.

The morph server: a standardized system for analyzing and visualizing macromolecular motions in a database framework.

WG Krebs, M Gerstein (2000). Nucleic Acids Res 28: 1665-75.

Patterns of protein-fold usage in eight microbial genomes: a comprehensive structural census.

M Gerstein (1998). Proteins 33: 518-34.

A database of macromolecular motions.

M Gerstein, W Krebs (1998). Nucleic Acids Res 26: 4280-90.

LPFC: an Internet library of protein family core structures.

R Schmidt, M Gerstein, RB Altman (1997). Protein Sci 6: 246-8.

Return to front page