博士生罗凌的论文被生物信息学顶级期刊 Bioinformatics 录用
新闻来源:IR实验室       发布时间:2017/11/24 0:00:00

近日收到生物信息学顶级期刊Bioinformatics编辑部邮件,实验室博士生罗凌的论文“An Attention-based BiLSTM-CRF Approach to Document-level Chemical Named Entity Recognition”被录用。该杂志是生物信息学领域最富盛名及影响力的顶级学术刊物之一,许多该领域利用计算算法研究生物学问题的重要发现的原创性论文均在该期刊发表,具有很高的学术认可度,2016年的影响因子为7.307。这也是两年内实验室师生录用的第三篇Bioinformatics论文,表明实验室近年来的学术研究处于较高的水平。

 

论文摘要:在生物医学研究中,化学物是一类很重要的实体,化学物实体识别在生物医学信息抽取中起到很重要的作用。而目前主流的化学物实体识别方法大多是基于传统的机器学习方法,它们的表现依赖于人工特征工程。而且这些方法主要是句子级别的方法,同一篇文档不同句子中的相同实体往往被标注成不同的标签,存在实体标注非一致性的问题。针对这些问题,本文提出了一种基于注意力机制的神经网络方法(Att-BiLSTM-CRF)来进行文档级别的化学物实体识别。本方法利用注意力机制来学习文档级别的全局信息,提高一篇文档内实体标注的一致性。在国际技术评测BioCreative IV CHEMDNER数据集和BioCreative V CDR数据集上的实验结果表明,本方法无需大量特征工程,在两个数据集上分别取得了91.14%92.57%F值,优于其它现有的方法。

 

Abstract

In biomedical research, chemical is an important class of entities, and chemical named entity recognition (NER) is an important task in the field of biomedical information extraction. However, most popular chemical NER methods are based on traditional machine learning and their performances are heavily dependent on the feature engineering. Moreover, these methods are sentence-level ones which have the tagging inconsistency problem. In this paper, we propose a neural network approach, i.e., attention-based bidirectional Long Short-Term Memory with a conditional random field layer (Att-BiLSTM-CRF), to document-level chemical NER. The approach leverages document-level global information obtained by attention mechanism to enforce tagging consistency across multiple instances of the same token in a document. It achieves better performances with little feature engineering than other state-of-the-art methods on the BioCreative IV chemical compound and drug name recognition (CHEMDNER) corpus and the BioCreative V chemical-disease relation (CDR) task corpus (the F-scores of 91.14% and 92.57%, respectively).