近日,实验室陈鹏博士关于小样本生物医学信息抽取的研究成果被Bioinformatics期刊录用。Bioinformatics是生物信息学领域高水平期刊之一, 属于中国计算机学会CCF B类期刊,JCR Q1期刊,影响因子为5.8。
题目:Learning to Explain is a Good Biomedical Few-shot Learner 学习解释是一个良好的生物医学小样本学习者
Significant progress has been achieved in biomedical text mining using deep learning methods, which rely heavily on large amounts of high-quality data annotated by human experts. However, the reality is that obtaining high-quality annotated data is extremely challenging due to data scarcity (e.g. rare or new diseases), data privacy and security concerns, and the high cost of data annotation. Additionally, nearly all researches focus on predicting labels without providing corresponding explanations. Therefore, in this paper, we investigate a more realistic scenario, biomedical few-shot learning, and explore the impact of interpretability on biomedical few-shot learning. We present LetEx—Learning to explain—a novel multi-task generative approach that leverages reasoning explanations from large language models (LLMs) to enhance the inductive reasoning ability of few-shot learning. Our approach includes (1) collecting high-quality explanations by devising a suite of complete workflow based on LLMs through CoT prompting and self-training strategies, (2) converting various biomedical NLP tasks into a text-to-text generation task in a unified manner, where collected explanations serve as additional supervision between text-label pairs by multi-task training. Experiments are conducted on three few-shot settings across six biomedical benchmark datasets. The results show that learning to explain improves the performances of diverse biomedical NLP tasks in low-resource scenario, outperforming strong baseline models significantly by up to 6.41%. Notably, the proposed method makes the 220M LetEx perform superior reasoning explanation ability against LLMs.
在生物医学文本挖掘领域,依赖大量由人类专家标注的高质量数据的深度学习方法在各种任务上取得了显著进展,。然而,现实情况是,由于数据稀缺(例如稀有或新颖疾病)、数据隐私与安全问题以及数据标注的高成本,获取高质量标注数据极具挑战性。此外,当前几乎所有的研究工作都专注于预测标签,而没有提供相应的解释。因此,本文探讨了一个更现实的场景:生物医学小样本学习,并探讨了解释性对生物医学小样本学习的影响。我们提出了LetEx——学习去解释——这是一种新颖的多任务生成方法,利用大型语言模型(LLMs)提供的推理解释来增强小样本学习的归纳推理能力。我们的方法包括:(1) 通过CoT提示和自训练策略,设计一套完整的工作流程,以收集高质量的标签解释;(2) 以统一的方式将各种生物医学自然语言处理任务转化为文本到文本的生成任务,其中收集的解释通过多任务训练作为文本-标签对之间的额外监督。我们在六个生物医学基准数据集的三个小样本设置上进行了实验。结果表明,在低资源场景下,学习解释显著提高了多种生物医学自然语言处理任务的表现,超过了强基线模型,提升幅度达到6.41%。值得注意的是,所提出的方法使得220M的LetEx在推理解释能力上优于其他大型语言模型。