Summary of Gerstein Lab Research in 2019
During 2019 the Gerstein lab was involved in numerous research projects in the area of human genomics, next generation sequencing, genomic privacy and commentary. These projects resulted in various publications or commentaries in journals such as Science, PNAS, PLoS Comput Biol, PLoS Genet and others.
Core Publications (for full citations see publication list)
A large part of the lab’s research is focused on cancer genomics with regards to precision medicine. By integrating 3D protein structures and dynamics, we developed a framework to identify cancer driver genes using a sensitive search of mutational hotspot communities in 3D structures. We found previously identified driver genes, as well as unidentified putative drivers. We also built a hybrid physical-statistical classifier for predicting the effect of variants related to protein-drug interactions. By integrating genetic variations with protein-drug complex structures we used physics-based calculations for the parameterization of our predictive model. Using our classifier, we efficiently predicted the impacts of single nucleotide variants on protein-drug interactions.
In Gerstein lab, we develop tool for the analysis of next generation sequencing data. TeXP is a new method that accounts and removes the effects of pervasive transcription to quantify repetitive sequences such as LINE activity. Using TeXP, we processed thousands of transcriptome datasets to measure LINE-1 activity across healthy somatic cells. We found that LINE-1 is broadly expressed in healthy tissues, that adult brain shows small levels of LINE transcription and LINE-1 transcription level is correlated with tissue cell turnover. We also developed exceRpt, a sophisticated platform that processes and analyzes exRNA profiling data. exceRpt generates quality control metrics, RNA abundance estimates, and reports. Moreover, it processes all RNA-seq datasets in the exRNA Atlas. Finally, our lab developed GRAM, a GeneRAlized Model to predict the molecular effect of a non-coding variant in a cell-type specific manner. We benchmarked GRAM on large-scale MPRA datasets, achieving AUROC scores of 0.72 in GM12878 and 0.66 in a multi-cell line dataset. We then evaluated the performance of GRAM on targeted regions using luciferase assays in the MCF7 and K562 cell lines.
Book reviews, opinions and commentary
In 2019, we participated in the scientific public discourse through reviews, opinion articles and commentaries. We published an opinion article analyzing the technical and cultural “exports” and “imports” between genomics and other data-science subdomains. Moreover, we discussed how data value, privacy, and ownership are pressing issues for data science applications. We also published a book commentary on “Collecting Experiments” a data management book by Bruno J. Strasser. Our commentary highlights issues of privacy, sharing and ownership of human-subject data, with regards to evolving databases, molecular sequences and structures.
Return to front page