Following closely the explosion of genomics data, the number of scientific publications in the biomedical sector has seen an exponential increase. This highlighted the necessity for tools and infrastructures that would automatically process and extract the information recorded in journal articles, using a standardized vocabulary with ontological relationships. In this direction, we have developed tools for extracting and analyzing information from literature, including PubNet, a web-based application used to extract and integrate information from PubMed providing a graphical visualization of complex networks in order to infer functional similarities, and YeastHub, a semantic web use case for integrating data in the life sciences domain. Moreover, we proposed structures for digital publications that facilitate automated literature mining; analyzed trends in biomedical publications on a large scale

Structuring supplemental materials in support of reproducibility.
D Greenbaum, J Rozowsky, V Stodden, M Gerstein (2017). Genome Biol 18: 64.

Temporal Dynamics of Collaborative Networks in Large Scientific Consortia.
D Wang, KK Yan, J Rozowsky, E Pan, M Gerstein (2016). Trends Genet 32: 251-3.

Closure of the NCBI SRA and implications for the long-term future of genomics data storage.
D Lipman, P Flicek, S Salzberg, M Gerstein, R Knight (2011). Genome Biol 12: 402.

The spread of scientific information: insights from the web usage statistics in PLoS article-level metrics.
KK Yan, M Gerstein (2011). PLoS One 6: e19917.

Reproducible Research: Addressing the need for data and code sharing in computational science
Yale Law School Roundtable on Data and Code Sharing (2010). Computing in Science & Engineering 12(5): 8-13 (Sept/Oct).

Structured digital tables on the Semantic Web: toward a structured digital literature.
KH Cheung, M Samwald, RK Auerbach, MB Gerstein (2010). Mol Syst Biol 6: 403.

Getting started in text mining: part two.
A Rzhetsky, M Seringhaus, MB Gerstein (2009). PLoS Comput Biol 5: e1000411.

Seeking a new biology through text mining.
A Rzhetsky, M Seringhaus, M Gerstein (2008). Cell 134: 9-13.

Open access: taking full advantage of the content.
PE Bourne, JL Fink, M Gerstein (2008). PLoS Comput Biol 4: e1000037.

Manually structured digital abstracts: a scaffold for automatic text mining.
M Seringhaus, M Gerstein (2008). FEBS Lett 582: 1170.

Uncovering trends in gene naming.
MR Seringhaus, PD Cayting, MB Gerstein (2008). Genome Biol 9: 401.

Semantic Web Approach to Database Integration in the Life Sciences
KH Cheung, AK Smith, KYL Yip, CJO Baker, MB Gerstein (2007). in Semantic Web: Revolutionizing Knowledge Discovery in the Life Sciences (eds. C Baker and K Cheung, Springer, NY), pp. 11-30

Semantic Web Standards: Legal and Social Issues and Implications
D Greenbaum, M Gerstein (2007). in Semantic Web: Revolutionizing Knowledge Discovery in the Life Sciences (eds. C Baker and K Cheung, Springer, NY), pp. 413-433

Leveraging the structure of the Semantic Web to enhance information retrieval for proteomics.
A Smith, K Cheung, M Krauthammer, M Schultz, M Gerstein (2007). Bioinformatics 23: 3073-9.

Structured digital abstract makes text mining easy.
M Gerstein, M Seringhaus, S Fields (2007). Nature 447: 142.

RNAi development.
M Gerstein, SM Douglas (2007). PLoS Comput Biol 3: e80.

Publishing perishing? Towards tomorrow's information architecture.
MR Seringhaus, MB Gerstein (2007). BMC Bioinformatics 8: 17.

Chemistry Nobel rich in structure.
M Seringhaus, M Gerstein (2007). Science 315: 40-1.

The Death of the Scientific Paper
Seringhaus M, Gerstein M (2006). The Scientist. 20(9): 25

Data mining on the web.
A Smith, M Gerstein (2006). Science 314: 1682; author reply 1682.

Tools needed to navigate landscape of the genome.
M Gerstein (2006). Nature 440: 740.

Network security and data integrity in academia: an assessment and a proposal for large-scale archiving.
A Smith, D Greenbaum, SM Douglas, M Long, M Gerstein (2005). Genome Biol 6: 119.

PubNet: a flexible system for visualizing literature derived networks.
SM Douglas, GT Montelione, M Gerstein (2005). Genome Biol 6: R80.

Impediments to database interoperation: legal issues and security concerns.
D Greenbaum, A Smith, M Gerstein (2005). Nucleic Acids Res 33: D3-4.

Computer security in academia-a potential roadblock to distributed annotation of the human genome.
D Greenbaum, SM Douglas, A Smith, J Lim, M Fischer, M Schultz, M Gerstein (2004). Nat Biotechnol 22: 771-2.

An analysis of the present system of scientific publishing: what's wrong and where to go from here
D Greenbaum, J Lim, M Gerstein (2003). Interdiscip Sci Rev 28:293-302

A universal legal framework as a prerequisite for database interoperability.
D Greenbaum, M Gerstein (2003). Nat Biotechnol 21: 979-82.

Blurring the boundaries between scientific 'papers' and biological databases
M Gerstein, J Junker (2002). Nature Yearbook of Science and Technology 210-212 (ed. D Butler, Palgrave Macmillan Publishers)

Interrelating different types of genomic data, from proteome to secretome: 'oming in on function.
D Greenbaum, NM Luscombe, R Jansen, J Qian, M Gerstein (2001). Genome Res 11: 1463-8.

Annotation of the human genome.
M Gerstein (2000). Science 288: 1590.

E-publishing on the Web: promises, pitfalls, and payoffs for bioinformatics.
M Gerstein (1999). Bioinformatics 15: 429-31.

Forging links in an electronic paper chain.
M Gerstein (1999). Nature 398: 20.

Return to front page