实验室参加BioNLP-ST 2016评测并取得优异成绩
新闻来源:IR实验室       发布时间:2016/7/16 12:55:31

近日,实验室李虹磊和张建海同学参加了2016 BioNLP Shared Task评测(简称BioNLP-ST 2016),并取得优异成绩。本次评测由法国农业科学研究院(INRA)和日本生命科学数据库中心(DBCLS)主办,吸引了来自全球相关科研机构和高校的广泛关注和参与,本次评测任务旨在抽取细粒度的生物实体之间复杂关系的生物医学事件,为分子生物和医学研究提供高质量的标注语料、工具以及评测服务。

BioNLP-ST 2016BioNLP-ST评测任务的第四届。BioNLP-ST前三届评测任务吸引了大量研究者的参与并提交结果。BioNLP-ST 2009第一届评测任务首次提出事件抽取的任务定义,数据集主要是从Pubmed摘要中选取的与‘NFKB’调控因子相关的文章进行标注生成的;BioNLP-ST 2011第二届评测任务的主题是泛化,旨在将已有的事件抽取系统应用到其它子领域,提出了一些新的事件抽取任务,并对全文数据进行标注;BioNLP-ST 2013第三届评测任务主要以知识库构建为主要目的。BioNLP-ST 2016第四届进一步扩大了生物文本挖掘领域的应用,引入了新的热点问题。此次评测任务共有三个主要任务:SeeDevBB3GE4SeeDev语料库主要涉及生物领域中种子发展的内容;BB任务旨在从生物文献中自动地抽取微生物和栖息地之间的复杂关系;GE任务语料仍沿用往届的语料资源,但更加强调从中抽取多方面的知识,是一个开放性任务。

我们此次参加的任务为其中的两个子任务:SeeDev-binary taskBB-event task,旨在抽取生物实体之间的复杂关系,在两个任务中我们分别取得了第二名和第七名的优异成绩。相关评测论文题目为《DUTIR in BioNLP-ST 2016: Utilizing Convolutional Network and Distributed Representation to Extract Complicate Relations》,论文摘要如下:

Abstract ---- We participate in the two event extraction tasks of BioNLP 2016 Shared Task: binary relation extraction of SeeDev task and localization relations extraction of Bacteria Biotope task. Convolutional neural network (CNN) is employed to model the sentences by convolution and max-pooling operation from raw input with word embedding. Then, full connected neural network is used to learn senior and significant features automatically. The proposed model mainly contains two modules: distributive semantic representation building, such as word embedding, POS embedding, distance embedding and entity type embedding, and CNN model training. The results with F-score of 0.370 and 0.478 in our participant tasks, which were evaluated on the test data set, show that our proposed method contributes to binary relation extraction effectively and can reduce the impact of artificial feature engineering through automatically feature learning.