Aim 2: Build the comprehensive immune profiling data hub (HeaLTH) for HIV/SUD affected individuals and construct the disease- and cell-type-specific regulatory networks.
We developed a statistical modeling approach called MLCrosstalk (multiple-layer crosstalk) to construct the full interactome related to viral infection. MLCrosstalk integrates data from multiple sources, including human protein-coding genes, miRNAs, microbes, and human protein-protein interactions. (Lou et al., Gerstein 2023 PLoS Comput Biol). We have used this model for the SARS-CoV-2 virus and are in the midst of applying it for HIV.
Aim 3: Develop novel machine learning models to uncover how key transcriptional, epigenetic, and network changes in CD4+ T cells upon HIV infection and/or SUD can lead to immune dysfunction.
In line with this aim concerning regulatory networks, we introduced a concept called C-score to measure the crowdedness of TF signals to identify the target motif of TFs. This approach leverages the number of TFs predicted to bind to a genomic region. The C-scores can help highlight the TF signals from non-specific interactions between the TF and its motif, hence significantly enhancing the accuracy of the target motif inference. We have inferred the motifs of all the TFs from human cell lines, and these are accessible on our website. (Xu et al Gerstein 2024 Nucleic Acids Res.)
Finally, we developed a new machine learning and big data approach to study disease dynamics (Salichos et al., Gerstein 2023 Sci Rep.). We additionally developed a machine learning-based general framework for the study of latent signatures. This deep generative architecture, a type of variational autoencoder, can be applied to investigate molecular alterations in biological systems following disease infection. (Warrell et al Gerstein 2024 J R Soc Interface)