P20 LM07253-01 (PI Miller)
9/15/01 - 8/31/04
NIH/NLM
Planning a Biomedical Computing Center of Excellence
Role: subproject leader
The overall grant is pilot pre-center grant. The Gerstein lab is responsible for a small subproject focused on using integrative data-mining to predict which proteins in a genome are highly expressed.
Biological Objective: Determine Characteristics of Abundant Proteins
The biological objective of this sub-project is to understand the characteristics of the highly expressed proteins (e.g. function, composition, structure, and so forth) and, perhaps, predict the gene expression levels of a protein based on these characteristics. This may be practically useful for large-scale proteomics projects.Computational Objective: Integrative Datamining
The computation objective is to address the above question through integrative database analysis and datamining in a fashion that seamlessly connects disparate information resources related to gene expression, functional annotation, regulation, genome sequence, and protein structure. More specifically, we hope to: (i) Develop two linked database systems, one for yeast expression data and the other for information related other protein features. (ii) Identify the features most related to expression through extensively cross-referencing the databases and employing a simple enrichment formalism. (iii) Do datamining and machine learning on the relevant features to see whether a predictive algorithm for expression level can be developed. For the datamining, we will try decision trees first because of the simplicity of the resulting rules and then go to Bayesian networks as a second option.URL: http://crisp.cit.nih.gov/crisp/CRISP_LIB.getdoc?textkey=6528448&p_grant_num=5P20LM007253-02&p_query=&ticket=3180931&p_audit_session_id=15237345&p_keywords=
Articles funded by this grant:
R Jansen, HJ Bussemaker, M Gerstein (2003). Nucleic Acids Res 31: 2242-51.
D Greenbaum, R Jansen, M Gerstein (2002). Bioinformatics 18: 585-96.
Return to front page