>>最受欢迎的情感词典,欢迎点击下载!<<
研究方向
学术报告
资源下载
当前位置: 首页>>新闻动态>>正文
    博士生陈鹏关于小样本实体识别的研究成果被Bioinformatics录用
    2023-09-02 21:26  

      近日,实验室博士生陈鹏关于小样本生物医学实体识别的研究成果被Bioinformatics期刊录用。Bioinformatics是生物信息学领域高水平期刊之一, 属于中国计算机学会CCF B类期刊,JCR Q1期刊,影响因子为5.8。

      题目:Few-shot biomedical named entity recognition via knowledge-guided instance generation and prompt contrastive learning(基于知识指导的实例生成和提示对比学习的小样本生物医学命名实体识别)

      Motivation: Few-shot learning that can effectively perform named entity recognition in low-resource scenarios has raised growing attention, but it has not been widely studied yet in the biomedical field. In contrast to high-resource domains, biomedical named entity recognition (BioNER) often encounters limited human-labeled data in real-world scenarios, leading to poor generalization performance when training only a few labeled instances. Recent approaches either leverage cross-domain high-resource data or fine-tune the pre-trained masked language model using limited labeled samples to generate new synthetic data, which is easily stuck in domain shift problems or yields low-quality synthetic data. Therefore, in this article, we study a more realistic scenario, i.e. few-shot learning for BioNER.

      Results: Leveraging the domain knowledge graph, we propose knowledge-guided instance generation for few-shot BioNER, which generates diverse and novel entities based on similar semantic relations of neighbor nodes. In addition, by introducing question prompt, we cast BioNER as question-answering task and propose prompt contrastive learning to improve the robustness of the model by measuring the mutual information between query-answer pairs. Extensive experiments conducted on various few-shot settings show that the proposed framework achieves superior performance. Particularly, in a low-resource scenario with only 20 samples, our approach substantially outperforms recent state-of-the-art models on four benchmark datasets, achieving an average improvement of up to 7.1% F1.


      小样本学习能够在低资源场景下有效地进行命名实体识别,受到越来越多的关注,但在生物医学领域尚未得到广泛研究。与高资源领域相比,生物医学命名实体识别(BioNER)在真实世界场景中经常遇到有限的手工标记数据,导致在少量的标签数据下进行训练时泛化性能较差。最近的方法要么利用跨领域的高资源数据,要么对预训练的掩码语言模型进行微调,使用有限的标记样本生成新的合成数据,但往往容易陷入领域迁移问题或产生低质量的合成数据。因此,在本文中,我们研究一个更现实的场景,即小样本BioNER。

      我们利用领域知识图谱提出了知识引导的实例生成方法,该方法基于相邻节点间相似的语义关系生成多样新颖的实体。此外,通过引入问题提示,我们将BioNER作为问答任务,并提出提示对比学习,通过衡量问答对之间的互信息来提高模型的鲁棒性。在各种少镜头设置上进行的大量实验表明,所提出的框架实现了优越的性能。特别是,在只有20个样本的低资源场景中,我们的方法在四个基准数据集上显著优于最新的模型,平均提高了7.1% F1。


    关闭窗口