Gerstein Lab Highlights in 2020

The Gerstein lab research

During 2020 the Gerstein lab was involved in numerous research projects in the area of human genomics, cancer genomics, cancer evolution, genomic privacy, commentary and book review. These projects resulted in various publications and commentaries in journals like Nature, Science Cell, Nature Communications, Genome Biology and others.

Core Publications (for full citations see publication list)

A large part of the lab’s research is focused on cancer, human genomics and genomic privacy with regards to precision medicine. As part of The Pan-Cancer Analysis of Whole Genomes (PCAWG) consortium our lab led and participated in projects such the analysis of more than 2500 cancer whole genomes estimating the burden of non-coding somatic drivers in these genomes, while also developing a framework that detects the presence and impact of cancer drivers from single individual samples using variant allele frequency. Similarly, as part of the ENCODE consortium, our lab led and participated in a successful series of publications on expanding the encyclopedias of DNA elements for genome interpretation in human and mouse genomes. Moreover, with regards to cancer genomics, we developed networks of both transcription factors and RNA-binding proteins that aim to interpret non-coding mutations and gene regulation. Finally, in relation to genomic privacy and personalized medicine, our lab quantified private information leakage from functional genomics while developing the necessary tools to allow raw functional genomic reads to be shared more safely (data sanitization).

Software tools

In Gerstein lab, we build tool for the analysis of next generation sequencing data. Our newly developed software expands from cancer genomics to the analysis of structural variance, functional regulation and privacy. More' specifically, in 2020 our lab 6 new tools for the analysis of next generation sequences. More specifically, SVFX is a machine learning application that aims to quantify the pathogenicity of structural variants (SVs) in somatic and germline SVs with focus on cancer-related pathways. RADAR is a variant scoring framework that uses conservation, RNA structure, network centrality and motifs to provide an overall impact score for noncoding variants associated with RNA-binding proteins (RPBs) functional dysregulation. FANCY, is an application for fast estimation of privacy risk in functional genomics data. SigLASSO is a software tool that decomposes the counts of cancer mutations into a linear combination of known cancer signatures. DiNeR is a graphical model that highlights key disease regulators by quantifying chances in co-regulatory networks and transcription factor co-binding patterns. Finally, TopicNet is a method that applies latent Dirichlet allocation to pinpoint particular transcription factors (TFs) that change greatly between cellular states.

Book reviews, opinions and commentary

In 2020, we participated in the scientific public discourse through reviews, opinion articles and commentaries. We published an opinion article in Hartford Courant highlighting the dangers of information leakage, during while for covid-19 during the pandemic entitled “As coronavirus testing expands, new personal privacy issues arise”. Moreover, we published a book review on “How innovation works” by M.Ridley and “The innovation delusion” by L.Vinsel and A. Russell. Our commentary discusses the different pathways suggested by the two books while innovation has been reduced to a buzzword.


Data Sanitization to Reduce Private Information Leakage from Functional Genomics.
G Gursoy, P Emani, CM Brannon, OA Jolanki, A Harmanci, JS Strattan, JM Cherry, AD Miranker, M Gerstein (2020). Cell 183: 905-917e16.

SVFX: a machine learning framework to quantify the pathogenicity of structural variants.
S Kumar, A Harmanci, J Vytheeswaran, MB Gerstein (2020). Genome Biol 21: 274.

Moving beyond buzzwords
D Greenbaum, M Gerstein (2020). Science. 370 (6513):178.

Perspectives on ENCODE.
ENCODE Project Consortium, MP Snyder, TR Gingeras, JE Moore, Z Weng, MB Gerstein, B Ren, RC Hardison, JA Stamatoyannopoulos, BR Graveley, EA Feingold, MJ Pazin, M Pagan, DA Gilchrist, BC Hitz, JM Cherry, BE Bernstein, EM Mendenhall, DR Zerbino, A Frankish, P Flicek, RM Myers (2020). Nature 583: 693-698.

An integrative ENCODE resource for cancer genomics.
J Zhang, D Lee, V Dhiman, P Jiang, J Xu, P McGillivray, H Yang, J Liu, W Meyerson, D Clarke, M Gu, S Li, S Lou, J Xu, L Lochovsky, M Ung, L Ma, S Yu, Q Cao, A Harmanci, KK Yan, A Sethi, G Gursoy, MR Schoenberg, J Rozowsky, J Warrell, P Emani, YT Yang, T Galeev, X Kong, S Liu, X Li, J Krishnan, Y Feng, JC Rivera-Mulia, J Adrian, JR Broach, M Bolt, J Moran, D Fitzgerald, V Dileep, T Liu, S Mei, T Sasaki, C Trevilla-Garcia, S Wang, Y Wang, C Zang, D Wang, RJ Klein, M Snyder, DM Gilbert, K Yip, C Cheng, F Yue, XS Liu, KP White, M Gerstein (2020). Nat Commun 11: 3696.

RADAR: annotation and prioritization of variants in the post-transcriptional regulome of RNA-binding proteins.
J Zhang, J Liu, D Lee, JJ Feng, L Lochovsky, S Lou, M Rutenberg-Schoenberg, M Gerstein (2020). Genome Biol 21: 151.

FANCY: fast estimation of privacy risk in functional genomics data.
G Gursoy, CM Brannon, FCP Navarro, M Gerstein (2020). Bioinformatics 36: 5145-5150.

Using sigLASSO to optimize cancer mutation signatures jointly with sampling likelihood.
S Li, FW Crawford, MB Gerstein (2020). Nat Commun 11: 3575.

DiNeR: a Differential graphical model for analysis of co-regulation Network Rewiring.
J Zhang, J Liu, D Lee, S Lou, Z Chen, G Gursoy, M Gerstein (2020). BMC Bioinformatics 21: 281.

TopicNet: a framework for measuring transcriptional regulatory network change.
S Lou, T Li, X Kong, J Zhang, J Liu, D Lee, M Gerstein (2020). Bioinformatics 36: i474-i481.

Passenger Mutations in More Than 2,500 Cancer Genomes: Overall Molecular Functional Impact and Consequences.
S Kumar, J Warrell, S Li, PD McGillivray, W Meyerson, L Salichos, A Harmanci, A Martinez-Fundichely, CWY Chan, MM Nielsen, L Lochovsky, Y Zhang, X Li, S Lou, JS Pedersen, C Herrmann, G Getz, E Khurana, MB Gerstein (2020). Cell 180: 915-927e16.

Pan-cancer analysis of whole genomes.
ICGC/TCGA Pan-Cancer Analysis of Whole Genomes Consortium (2020). Nature 578: 82-93.

Analyses of non-coding somatic drivers in 2,658 cancer whole genomes.
E Rheinbay, MM Nielsen, F Abascal, JA Wala, O Shapira, G Tiao, H Hornshj, JM Hess, RI Juul, Z Lin, L Feuerbach, R Sabarinathan, T Madsen, J Kim, L Mularoni, S Shuai, A Lanzos, C Herrmann, YE Maruvka, C Shen, SB Amin, P Bandopadhayay, J Bertl, KA Boroevich, J Busanovich, J Carlevaro-Fita, D Chakravarty, CWY Chan, D Craft, P Dhingra, K Diamanti, NA Fonseca, A Gonzalez-Perez, Q Guo, MP Hamilton, NJ Haradhvala, C Hong, K Isaev, TA Johnson, M Juul, A Kahles, A Kahraman, Y Kim, J Komorowski, K Kumar, S Kumar, D Lee, KV Lehmann, Y Li, EM Liu, L Lochovsky, K Park, O Pich, ND Roberts, G Saksena, SE Schumacher, N Sidiropoulos, L Sieverling, N Sinnott-Armstrong, C Stewart, D Tamborero, JMC Tubio, HM Umer, L Uuskula-Reimand, C Wadelius, L Wadi, X Yao, CZ Zhang, J Zhang, JE Haber, A Hobolth, M Imielinski, M Kellis, MS Lawrence, C von Mering, H Nakagawa, BJ Raphael, MA Rubin, C Sander, LD Stein, JM Stuart, T Tsunoda, DA Wheeler, R Johnson, J Reimand, M Gerstein, E Khurana, PJ Campbell, N Lopez-Bigas, PCAWG Drivers and Functional Interpretation Working Group, PCAWG Structural Variation Working Group, J Weischenfeldt, R Beroukhim, I Martincorena, JS Pedersen, G Getz, PCAWG Consortium (2020). Nature 578: 102-111.

Estimating growth patterns and driver effects in tumor evolution from individual samples.
L Salichos, W Meyerson, J Warrell, M Gerstein (2020). Nat Commun 11: 732.


Return to front page