博士生郑巍的论文被Journal of Biomedical Informatics录用
新闻来源:IR实验室       发布时间:2018/5/9 21:16:39

2018年5月9日,接到编辑部通知,博士生郑巍的论文 "An effective neural model extracting document level chemical-induced disease relations from biomedical literature" 被Journal of Biomedical Informatics录用。


Here's the abstract of paper:

  Since identifying relations between chemicals and diseases (CDR) are important for biomedical research and healthcare, the challenge proposed by BioCreative V requires automatically mining causal relationships between chemicals and diseases which may span sentence boundaries. Although most systems explore feature engineering and knowledge bases to recognize document level CDR relations, feature learning automatically is limited only in a sentence.

  In this work, we proposed an effective model that automatically learns document level semantic representations to extract chemical-induced disease (CID) relations from articles by combining advantages of convolutional neural network and recurrent neural network. First, to purposefully collect contexts, candidate entities existing in multiple sentences of an article were masked to make the model have ability to discern candidate entities and general terms. Next, considering the contiguity and temporality among associated sentences as well as the topic of an article, a hierarchical network architecture was designed at the document level to capture semantic information of different types of text segments in an article. Finally, a softmax classifier performed the CID recognition. 

  Experimental results on the CDR corpus show that the proposed model achieves a good overall performance compared with other state-of-the-art methods. Although only using two types of embedding vectors, our approach can perform well for recognizing not only intra-sentential but also inter-sentential CID relations.


论文摘要如下:

  药物和疾病间的关系识别对生物医学研究和医疗保健具有重要的作用。因此,BioCreative V提出从文章中自动挖掘药物疾病间(chemical disease relations,CDR)因果关系的挑战赛,参与关系的药物疾病实体可以跨越文章中句子的边界。对于篇章级的药物诱导疾病关系(chemical-induced disease,CID),虽然多数系统探索了特征工程和知识的利用,自动的特征学习还仅仅限制在句子级别。

  在这个研究里,通过联合卷积神经网络和循环神经网络的优势,本文提出了一个从文章中自动学习篇章级语义表示的有效模型提取CID关系。首先,为了有目的的选择上下文,标记了存在于一篇文章中多个句子里的候选实体使模型有能力区别候选识体和一般的术语。其次,考虑了相关句子之间的邻接性、时序性以及文章的主题,设计了层次的篇章级神经网络架构捕获文章中不同类型文本片断的语义。最后,softmax分类器完成CID关系的识别。

在CDR语料上的实验结果显示,相较于其他先进的系统,提出的系统获得了好的综合性能。虽然仅仅使用了两种类型的嵌入向量,本文的方法能有效地识别句内和句间的CID关系。