博士生郑巍的论文被 BMC Bioinformatics 录用
新闻来源:IR实验室       发布时间:2018/8/15 11:09:37

  近日,接到编辑部通知,博士生郑巍的论文 "A document level neural model integrated domain knowledge for chemical-induced disease relations" 被BMC Bioinformatics录用。论文摘要如下:


Abstract:

Background: The effective combination of texts and knowledge may improve performances of natural language processing tasks. For the recognition of chemical-induced disease (CID) relations which may span sentence boundaries in an article, although existing CID systems explored the utilization for knowledge bases, the effects of different knowledge on the identification of a special CID haven’t been distinguished by these systems. Moreover, systems based on neural network only constructed sentence or mention level models.

Results: In this work, we proposed an effective document level neural model integrated domain knowledge to extract CID relations from biomedical articles. Basic semantic information of an article with respect to a special CID candidate pair was learned from the document level sub-network module. Furthermore, knowledge attention depending on the representation of the article was proposed to distinguish the influences of different knowledge on the special CID pair and then the final representation of knowledge was formed by aggregating weighed knowledge. Finally, the integrated representations of texts and knowledge were passed to a softmax classifier to perform the CID recognition. Experimental results on the chemical-disease relation corpus proposed by BioCreative V show that our proposed system integrated knowledge achieves a good overall performance compared with other state-of-the-art systems.

Conclusions: Experimental analyses demonstrate that the introduced attention mechanism on domain knowledge plays a significant role in distinguishing influences of different knowledge on the judgment for a special CID relation.  

Keywords: Chemical-induced diseases, Document level, Knowledge, Attention mechanism, Neural network, Text mining

 

 中文摘要:

背景:文本和知识的有效联合可以改善自然语言处理任务的性能。对于一篇文章里允许跨越句子边界的药物诱发疾病(CID)关系的识别,虽然存在的CID系统探索了知识的利用,然而,这些系统没有区别不同知识对一个特殊CID识别的影响。而且,现有的基于神经网络系统仅仅构建了基于句子或mention级的模型。

结果:本文提出了一个有效的融合领域知识的文档级神经模型提取生物医学文章中的CID关系。首先,通过一个文档级的子网络模块学习一篇文章关于一个特殊药物疾病候选对的基本语义信息。进一步,提出了依赖于文章表示的知识attention以区别不同知识对一对特殊药物疾病的影响,然后累加学到的带权知识形成知识的最终表示。最后,融合的文本和知识表示传递给一个softmax分类器完成CID关系的识别。在BioCreative V提出的药物诱发疾病关系语料上的实验结果显示,与当前一流的系统相比较,融合了知识的本文系统获得了好的综合性能。

结论:实验分析表明,本文提出的领域知识上的attention机制,在区别不同知识对一个特殊CID关系的影响上起着重要的作用。