In 2024, we contributed to the fields of neurogenomics, single-cell transcriptomics, regulatory networks, structural biology, and bioinformatics, developing useful software tools and publishing in well-known journals such as PNAS, Science, NAR, Bioinformatics, and others. Many projects were part of large consortia, including PsychENCODE, GENCODE, IGVF (Impact of Genomic Variation on Function), HGSVC (Human-Genome Structural-Variation Consortium), modERN (Model-Organism Encyclopedia of Regulatory Networks), and SCORCH (Single-Cell Opioid-Responses in Context of HIV).
The neurogenomics work is the most significant highlight (Emani et al., Science, 2024). In the PsychENCODE consortium, we analyzed single-nuclei multi-omics data from 388 brains, uncovering 1.4 million expression quantitative-trait loci and cell-type-specific regulatory networks. This work revealed cellular changes in aging and neuropsychiatric disorders, enabling the construction of deep-learning models prioritizing disease-risk genes and drug targets.
The application of large language models (LLMs) to biomedical challenges was a significant focus of our research. In particular, we surveyed LLMs and generative AI in drug design (Tang et al., Brief Bioinform, 2024). We introduced MolLM, a unified language model that integrates 2D and 3D molecular structures with biomedical text for tasks such as molecule-text matching, property prediction, and text-prompted molecular editing. This model emphasizes the importance of explicit 3D molecular representations in enhancing cross-modal capabilities (Tang et al., Bioinformatics, 2024).
BioCoder, a benchmark for bioinformatics code generation, demonstrates how LLMs can assist in automating repetitive coding tasks, though challenges remain in handling complex bioinformatics pipelines (Tang et al., Bioinformatics, 2024). We fine-tuned an LLM to predict protein phase transitions (Frank et al., Proc Natl Acad Sci U S A, 2024). This work demonstrated that more extensive aggregation is associated with reduced gene expression in Alzheimer's, suggesting a natural defense mechanism.
For the phase-transition problem, we also showed how a graph neural network could help in more precisely defining disordered regions of proteins, a key biophysical feature in predicting transitions (Wang et al., Cell Rep Phys Sci, 2024). Additional LLM work includes our contributions to developing FAVOR-GPT, a generative interface for interpreting genomic variant annotations, and the Dense Homolog Retriever, a LLM enhancing homolog detection (Li et al., Bioinform Adv, 2024; Hong et al., Nat Biotechnol, 2024).
Much of the rest of our research was focused on tool development in biomedical data science.
In particular, we developed REPIC, a consensus-based method for Cryo-EM particle picking, which helps integrate outputs from multiple algorithms into high-quality consensus reconstructions, reducing user burden and enhancing accuracy (Cameron et al., Commun Biol, 2024).
We also developed an ensemble framework for combining empirical docking with deep learning models, significantly improving affinity predictions. This framework optimized meta-modeling approaches to outperform individual base models (Lee et al., J Chem Inf Model, 2024).
We explored deep learning methods for the early detection of Alzheimer's disease, highlighting the complexities inherent in integrating multimodal medical datasets and demonstrating the potential of deep learning in medical imaging to predict Alzheimer's disease (Zhou et al., PLoS One, 2024).
We introduced LatentDAG, a Bayesian network that simplifies gene expression relationships. In combination with a graph neural network, LatentDAG improves tasks such as gene conservation prediction and gene clustering (Gao et al., Bioinformatics, 2024).
We introduced the concept of TF signal 'crowdedness' to address interference from non-target motifs in ChIP-seq data, which allowed us to improve motif inference accuracy (Xu et al., Nucleic Acids Res, 2024).
Finally, we analyzed music and cultural evolution, revealing patterns that resemble biological evolution (Warrell et al., J R Soc Interface, 2024).
Return to front page