博士生罗凌的论文被DATABASE录用
新闻来源:IR实验室       发布时间:2018/8/22 13:42:12

近日收到生物信息学期刊DATABASE - The Journal of Biological Databases and Curation编辑部邮件,实验室博士生罗凌的论文“Document triage for identifying protein-protein interactions affected by mutations: a neural network ensemble approach”被录用,该期刊影响因子为3.978

 

中文摘要:精准医学计划根据患者的基因及其相关反应确定个体化治疗。为了帮助精准医学领域的卫生专业人员和研究人员,BioCreative VI评测组织了一个精确医学(PM)任务来挖掘生物医学文献中受基因突变影响的蛋白质-蛋白质相互作用(PPIm)。在本文中,我们提出了一种神经网络集成方法来识别PPIm的相关文章。该方法中,多个神经网络模型用于文档分类,并且集成的结果优于单个模型。在官方评测中,我们的最佳提交结果在BioCreative VI PM文档分类任务中获得了69.04%的F值。评测之后,为了解决训练集大小有限的问题,我们将先验的预训练模块加入到模型中以进一步改善性能。最后,我们的最好集成方法在测试集上获得了71.04%的F值。

(数据和代码可在https://github.com/lingluodlut/BioCreativeVI-PM-Track获得。)

 

Abstract

The precision medicine initiative promises to identify individualized treatment depending on a patients’ genetic profile and their related responses. In order to help health professionals and researchers in the precision medicine endeavor, BioCreative VI organized a Precision Medicine (PM) Track to mine protein-protein interactions (PPI) affected by genetic mutations from the biomedical literature. In this paper, we present a neural network ensemble approach to identify relevant articles describing PPI affected by mutations (PPIm). In this approach, several neural network models are used for document triage, and the ensemble performs better than any individual model. In the official runs, our best submission achieves an F-score of 69.04% in the BioCreative VI PM document triage task. After post-challenge analysis, to address the problem of the limited size of training set, a PPI pre-trained module is incorporated into our approach to further improve the performance. Finally, our best ensemble method achieves an F-score of 71.04% on the test set.

Database URL: Data and code are available at https://github.com/lingluodlut/BioCreativeVI-PM-Track.