BioMamba: Domain-Adaptive Biomedical Language Models
Ling Yue, Mingzhi Zhu, Sixue Xing, Shaowu Pan, Vijil Chenthamarakshan, Yanbo Wang, Yunning Cao, Payel Das, Tianfan Fu · Aug 5, 2024 · Citations: 0
How to use this page
Low trustUse this as background context only. Do not make protocol decisions from this page alone.
Best use
Background context only
What to verify
Validate the evaluation procedure and quality controls in the full paper before operational use.
Evidence quality
Low
Derived from extracted protocol signals and abstract evidence.
Abstract
Background: Biomedical language models should improve performance on biomedical text while retaining general-domain language ability. For Mamba-based models, this trade-off has not been clearly studied across biomedical literature and clinical text. Methods: We developed BioMamba, a family of biomedical models obtained by continued pretraining of public Mamba2 checkpoints on PubMed, with small amounts of general-domain data from the Colossal Clean Crawled Corpus (C4) and Wikipedia included to help preserve general-domain language ability. We evaluated language modeling and three downstream tasks across multiple model scales: clinical note completion, discharge summary generation, and biomedical yes/no question answering. Results: BioMamba consistently improved PubMed modeling, improved Wikipedia modeling, and left C4 performance largely unchanged. After supervised fine-tuning, BioMamba transferred well to both biomedical literature and clinical text, yielding strong results on completion, summarization, and question answering. On MIMIC-IV, BioMamba+SFT consistently matched or exceeded SFT from the corresponding base checkpoints across note completion and discharge summary generation. The strongest model achieved a PubMed perplexity of 5.28 and accuracies of 90.24% and 73.00% on BioASQ and PubMedQA, respectively. Conclusion: Balanced domain-adaptive pretraining strategy strengthens Mamba language models for both biomedical literature and clinical text, while preserving general-domain language capabilities, establishing BioMamba as a practical foundation for biomedical NLP applications.