近日,博士生陈鹏关于生物医学信息抽取方面的研究成果被IEEE旗下Transactions on Computational Biology and Bioinformatics(TCBB)期刊录用,其为CCF推荐期刊B类。
题目:Knowledge Adaptive Multi-way Matching Network for Biomedical Named Entity Recognition via Machine Reading Comprehension(基于机器阅读理解的知识自适应多路匹配网络的生物医学命名实体识别)
摘要:Rapid and effective utilization of biomedical literature is paramount to combat diseases like COVID19. Biomedical named entity recognition (BioNER) is a fundamental task in text mining that can help physicians accelerate knowledge discovery to curb the spread of the COVID-19 epidemic. Recent approaches have shown that casting entity extraction as the machine reading comprehension task can significantly improve model performance. However, two major drawbacks impede higher success in identifying entities (1) ignoring the use of domain knowledge to capture the context beyond sentences and (2) lacking the ability to deeper understand the intent of questions. In this paper, to remedy this, we introduce and explore external domain knowledge which cannot be implicitly learned in text sequence. Previous works have focused more on text sequence and explored little of the domain knowledge. To better incorporate domain knowledge, a multi-way matching reader mechanism is devised to model representations of interaction between sequence, question and knowledge retrieved from Unified Medical Language System (UMLS). Benefiting from these, our model can better understand the intent of questions in complex contexts. Experimental results indicate that incorporating domain knowledge can help to obtain competitive results across 10 BioNER datasets, achieving absolute improvement of up to 2.02% in the f1 score.
快速有效地利用生物医学文献对于抗击COVID19等疾病至关重要。生物医学命名实体识别(BioNER)是文本挖掘中的一项基本任务,可以帮助医生加速知识发现,以抑制新冠肺炎疫情的传播。最近的方法表明,将实体提取转换为机器阅读理解任务可以显著提高模型性能。然而,两个主要问题阻碍了更好的识别实体:(1)忽略了使用领域知识来捕捉句子之外的上下文,以及(2)缺乏更深入理解问题意图的能力。在本文中,为了弥补这一点,我们引入并探索了在文本序列中无法隐式学习的外部领域知识。以往的工作更多地关注文本序列,对领域知识的探索很少。为了更好地整合领域知识,多路匹配阅读机制被设计去对文本序列、问题和从UMLS检索的领域知识之间的多路交互进行建模。受益于这些,我们的模型可以更好地理解复杂上下文中问题的意图。实验结果表明,结合领域知识有助于在10个BioNER数据集上获得有竞争力的结果,f1值实现高达2.02%的绝对改进。