Aim 1: Integrate high-dimensional data to build the comprehensive cell atlas, dissect the cis-regulatory landscape, and construct multi-modal gene regulatory networks in CD4+ T cells from healthy individuals.

 

To support the integration of multiple types of molecular data as required by this aim, we developed a customized biomedical adapter tool that uniformly processes, prepares, and registers diverse biomedical data for a Bayesian analysis that corrects for complex covariate structures so that the significance of interventions over time can be evaluated. We applied the software implementation of the tool to a structured set of examples with rich time series data streams. First, we showed it could detect an exercise intervention's effect on stabilizing continuously monitored blood glucose in diabetes after adjustment for four other continuous variables from wearable sensors. Second, we showed it could detect a neighborhood watch program's effect on reducing violent behavior in its specific neighborhood after adjustment for many other factors by a paired covariate strategy. Overall, the tool stands apart from similar time series methods because of the number of complex covariates it can accommodate, which will become increasingly relevant as technological developments increase the number of biometrics simultaneously measurable.

 

We therefore organized a virtual panel of stakeholders to guide the pragmatic aspects of such technological developments by discussing guiding standards for device quality and data formatting by wearable biosensors. It was attended by 121 delegates representing both researchers and industry manufacturers from 15 countries. Proceedings supported the need for a networking hub connecting researchers and manufacturers to guide each other on these priorities.

 

We have also developed an integrating tool specific to genomics called Gene Tracer, a voice-activated tool that searches and visualizes disease-associated gene information, deleterious mutations, and gene networks. As the voice can be well recognized and understood, Gene Tracer provides users with increased flexibility to acquire knowledge and is broadly applicable to other scenarios. The Tracer can (1) retrieve genomic information, (2) visualize genomic regions, (3) view genomic mutations, (4) view genomic networks, and (5) record browsing history. (Lou et al., Gerstein 2021 Bioinformatics)

 

For the cis-regulatory component (i.e., the identification and localization of cis-regulatory elements [CREs] in human cell lines from high-throughput STARR-seq experiments), we have developed a two-step model called Deep-learning framework for Condensing enhancers and refining boundaries with large-scale functional assays (DECODE). In the first step, we trained a deep neural network to accurately identify cell-type-specific enhancers from STARR-seq readouts. In the second step, we implemented a weakly supervised object detection framework to detect the precise boundary location of enhancers using Gradient-weighted Class Activation Mapping. The model outperformed a state-of-the-art enhancer prediction method by 24% in transgenic mouse validation. Furthermore, the object detection framework can condense enhancer annotations to only 13% of their original size, and these compact annotations have significantly higher conservation scores and genome-wide association study variant enrichments than the original predictions. (Chen et al., Gerstein 2021 Bioinformatics)