Before formal publication in a scholarly journal, scientific and medical articles are traditionally “peer reviewed.” In this process, the journal’s editors take advice from various experts—called “referees”—who have assessed the paper and may identify weaknesses in its assumptions, methods, and conclusions. Typically a journal will only publish an article once the editors are satisfied that the authors have addressed referees’ concerns.

Because this process can be lengthy, authors use the bioRxiv service to make their manuscripts available as “preprints” before peer review, allowing other scientists to see, discuss, and comment on the findings immediately. Readers should therefore be aware that articles on bioRxiv have not been finalized by authors, might contain errors, and report information that has not yet been accepted or endorsed in any way by the scientific or medical community.

MolLM: A Unified Language Model to Integrate Biomedical Text with 2D and 3D Molecular Representations
Xiangru Tang, Andrew Tran, Jeffrey Tan, Mark B. Gerstein (2023). bioRxiv.

Igniting Language Intelligence: The Hitchhiker's Guide From Chain-of-Thought Reasoning to Language Agents
Zhuosheng Zhang, Yao Yao, Aston Zhang, Xiangru Tang, Xinbei Ma, Zhiwei He, Yiming Wang, Mark Gerstein, Rui Wang, Gongshen Liu, Hai Zhao (2023). arXiv.

MedAgents: Large Language Models as Collaborators for Zero-shot Medical Reasoning
Xiangru Tang, Anni Zou, Zhuosheng Zhang, Yilun Zhao, Xingyao Zhang, Arman Cohan, Mark Gerstein (2023). arXiv.

ML-Bench: Large Language Models Leverage Open-source Libraries for Machine Learning Tasks
Yuliang Liu, Xiangru Tang, Zefan Cai, Junjie Lu, Yichi Zhang, Yanjun Shao, Zexuan Deng, Helan Hu, Zengxian Yang, Kaikai An, Ruijun Huang, Shuzheng Si, Sheng Chen, Haozhe Zhao, Zhengliang Li, Liang Chen, Yiming Zong, Yan Wang, Tianyu Liu, Zhiwei Jiang, Baobao Chang, Yujia Qin, Wangchunshu Zhou, Yilun Zhao, Arman Cohan, Mark Gerstein (2023). arXiv.

Struc-Bench: Are Large Language Models Really Good at Generating Complex Structured Data?
Xiangru Tang, Yiming Zong, Jason Phang, Yilun Zhao, Wangchunshu Zhou, Arman Cohan, Mark Gerstein (2023). arXiv.

BioCoder: A Benchmark for Bioinformatics Code Generation with Contextual Pragmatic Knowledge
Xiangru Tang, Bill Qian, Rick Gao, Jiakang Chen, Xinyun Chen, Mark Gerstein (2023). arXiv.

Leveraging a large language model to predict protein phase transition: a physical, multiscale and interpretable approach
M Frank, P Ni, M Jensen, MB Gerstein (2023). bioRxiv.

chronODE: A framework to integrate time-series multi-omics data based on ordinary differential equations combined with machine learning
B Borsari, M Frank, ES Wattenberg, K Xu, S Liu, X Yu, MB Gerstein (2023). bioRxiv.

The ENCODE4 long-read RNA-seq collection reveals distinct classes of transcript structure diversity.
F Reese, B Williams, G Balderrama-Gutierrez, D Wyman, MH Celik, E Rebboah, N Rezaie, D Trout, M Razavi-Mohseni, Y Jiang, B Borsari, S Morabito, HY Liang, CJ McGill, S Rahmanian, J Sakr, S Jiang, W Zeng, K Carvalho, AK Weimer, LA Dionne, A McShane, K Bedi, SI Elhajjajy, S Upchurch, J Jou, I Youngworth, I Gabdank, P Sud, O Jolanki, JS Strattan, MS Kagda, MP Snyder, BC Hitz, JE Moore, Z Weng, D Bennett, L Reinholdt, M Ljungman, MA Beer, MB Gerstein, L Pachter, R Guigo, BJ Wold, A Mortazavi (Preprint). bioRxiv.

REPIC — an ensemble learning methodology for cryo-EM particle picking
CJF Cameron, SJH Seager, FJ Sigworth, HD Tagare, MB Gerstein (2023). bioRxiv.

Compression-based Network Interpretability Schemes
J Warrell, H Mohsen, M Gerstein (2020). bioRxiv.

LESSeq: Local event-based analysis of alternative splicing using RNA-Seq data
J Leng, CJF Cameron, S Oh, E Khurana, JP Noonan, MB Gerstein (2019). bioRxiv.

Latent Evolutionary Signatures: A General Framework for Analyzing Music and Cultural Evolution
J Warrell, L Salichos, M Gancz, MB Gerstein (2024). J R Soc Interface 21: 20230647.


Return to front page