We developed a blockchain network for human genomic information (1, see reference list below). This tool gives genomics the data decentralization, ownership, and integrity of blockchain. Blockchain was previously limited to fields with smaller data. The NY Times quoted the work in relation to big data and privacy.

Our other core research highlights of 2022 included findings in rare cell type discovery from single-cell data, genomic regulation in neuropsychiatric disorders, and differential data privacy across institutions in federated learning. These works were published in journals including Genome Biology, Nature Communications, and Genome Medicine.

Our first such publication advanced methods for clustering single-cell data to discover rare cell types (2). The new method Forest Fire Clustering is efficient, interpretable, minimizes prior assumptions, and calculates the posterior probability of labeling assignment which reflects label confidence and “label entropies” (i.e., transitions). The second addressed gene regulatory mechanisms underlying the association between genetic variants and neuropsychiatric disorders (3). We built upon prior publications that described expression quantitative trait loci regulating nearby genes (cis-eQTLs), by identifying another ~80,000 loci that potentially regulate distant genes (candidate trans-eQTLs) and characterizing their interaction with the cis-eQTLs. These candidate trans-eQTLs included 23 variants and 90 genes that are associated with schizophrenia based on GWAS but missed by cis-eQTLs.

We also continue contributions to healthcare data management and security. Specifically, we published the first federated learning framework to implement differential privacy protocols by institution (4). This tool will expand the scope of institutional databases that a federated learning project can access to feed machine learning models.

We also several perspective and opinion pieces. In particular, we engaged in editorial dialogue regarding the overlap of genetic specimen authentication techniques between biomedical research and the justice system (5), as well as an updated look at genetic issues presaged by the science fiction film GATTACA 25 years ago (6).

We contributed book reviews and career columns to top journals. We reviewed “The Man from the Future: The Visionary Life of John von Neumann” (7) and also teamed up with doctoral student Jonathan Park to write about switching labs during a PhD (8).

We participated heavily in two major multisite genomics publications in Nature, in the fields of neuro- and cancer genomics (9, 10).

References to relevant publications

1. Gürsoy G, Brannon CM, Ni E, Wagner S, Khanna A, Gerstein M. Storing and analyzing a genome on a blockchain. Genome Biol. 2022;23(1):134. Published 2022 Jun 29. doi:10.1186/s13059-022-02699-7

2. Chen Z, Goldwasser J, Tuckman P, Liu J, Zhang J, Gerstein M. Forest Fire Clustering for single-cell sequencing combines iterative label propagation with parallelized Monte Carlo simulations. Nat Commun. 2022;13(1):3538. Published 2022 Jun 20. doi:10.1038/s41467-022-31107-8

3. Liu S, Won H, Clarke D, et al. Illuminating links between cis-regulators and trans-acting variants in the human prefrontal cortex. Genome Med. 2022;14(1):133. Published 2022 Nov 24. doi:10.1186/s13073-022-01133-8

4. Khanna A, Schaffer V, Gursoy G, Gerstein M. Privacy-preserving Model Training for Disease Prediction Using Federated Learning with Differential Privacy. Annu Int Conf IEEE Eng Med Biol Soc. 2022;2022:1358-1361. doi:10.1109/EMBC48229.2022.9871742

5. Greenbaum D, Gerstein M. Genomic research data and the justice system. Science. 2022;377(6608):826-827. doi:10.1126/science.add7974

6. Greenbaum D, Gerstein M. GATTACA is still pertinent 25 years later. Nat Genet. 2022;54(12):1758-1760. doi:10.1038/s41588-022-01242-5

7. Greenbaum D, Gerstein M. The lasting legacy of John von Neumann. Science. 2022;375(6584):983. doi:10.1126/science.abn7018

8. Park J, Gerstein M. Switching labs during a PhD [published online ahead of print, 2022 Jul 6]. Nature. 2022;10.1038/d41586-022-01867-w. doi:10.1038/d41586-022-01867-w

9. Gandal MJ, Haney JR, Wamsley B, et al. Broad transcriptomic dysregulation occurs across the cerebral cortex in ASD. Nature. 2022;611(7936):532-539. doi:10.1038/s41586-022-05377-7

10. Erwin GS, Gürsoy G, Al-Abri R, et al. Recurrent repeat expansions in human cancer genomes [published online ahead of print, 2022 Dec 14]. Nature. 2022;10.1038/s41586-022-05515-1. doi:10.1038/s41586-022-05515-1


Recurrent repeat expansions in human cancer genomes
GS Erwin, G Gursoy, R Al-Abri, A Suriyaprakash, E Dolzhenko, K Zhu, CR Hoerner, SM White, L Ramirez, A Vadlakonda, A Vadlakonda, K von Kraut, J Park, CM Brannon, DA Sumano, RA Kirtikar, AA Erwin, TJ Metzner, RKC Yuen, AC Fan, JT Leppert, MA Eberle, M Gerstein, MP Snyder (2023). Nature 613: 96-102.

Genomic research data and the justice system.
D Greenbaum, M Gerstein (2022). Science 377: 826-827.

GATTACA is still pertinent 25 years later
D Greenbaum, M Gerstein (2022). Nat Genet 54: 1758-1760.

Illuminating links between cis-regulators and trans-acting variants in the human prefrontal cortex.
S Liu, H Won, D Clarke, N Matoba, S Khullar, Y Mu, D Wang, M Gerstein (2022). Genome Med 14: 133.

Broad transcriptomic dysregulation occurs across the cerebral cortex in ASD.
MJ Gandal, JR Haney, B Wamsley, CX Yap, S Parhami, PS Emani, N Chang, GT Chen, GD Hoftman, D de Alba, G Ramaswami, CL Hartl, A Bhattacharya, C Luo, T Jin, D Wang, R Kawaguchi, D Quintero, J Ou, YE Wu, NN Parikshak, V Swarup, TG Belgard, M Gerstein, B Pasaniuc, DH Geschwind (2022). Nature 611: 532-539.

Privacy-preserving Model Training for Disease Prediction Using Federated Learning with Differential Privacy.
A Khanna, V Schaffer, G Gursoy, M Gerstein (2022). Annu Int Conf IEEE Eng Med Biol Soc 2022: 1358-1361.

Switching labs during a PhD.
J Park, M Gerstein (2022). Nature.

Storing and analyzing a genome on a blockchain.
G Gursoy, CM Brannon, E Ni, S Wagner, A Khanna, M Gerstein (2022). Genome Biol 23: 134.

Forest Fire Clustering for single-cell sequencing combines iterative label propagation with parallelized Monte Carlo simulations.
Z Chen, J Goldwasser, P Tuckman, J Liu, J Zhang, M Gerstein (2022). Nat Commun 13: 3538.

The lasting legacy of John von Neumann
D Greenbaum, M Gerstein (2022). Science 375: 983.


Return to front page