近日,计算语言学国际顶级会议(COLING 2020)公布了录用论文列表,实验室硕士生陈彦光的研究成果被录用为长文,将在会议上做口头报告,COLING被CCF列为B级国际会议,由国际计算语言学学会(ICCL)主办,从1965年开始,除少数情况外每两年召开一届。究,吸引了世界各地的计算机和语言学领域研究者的持续关注,实验室长期致力于自然语言处理领域研究,期待这次与国内外同行的交流。录用论文题目和摘要如下:
题目:Joint Entity and Relation Extraction for Legal Documents with Legal Feature Enhancement
摘要:In recent years, the plentiful information contained in Chinese legal documents has attracted a great deal of attention because of the large-scale release of the judgment documents on China Judgments Online. It is in great need of enabling machines to understand the semantic infor-mation stored in the documents which are transcribed in the form of natural language. The technique of information extraction provides a way of mining the valuable information im-plied in the unstructured judgment documents. We propose a Legal Triplet Extraction System for drug-related criminal judgment documents. The system extracts the entities and the se-mantic relations jointly and benefits from the proposed legal lexicon feature and multi-task learning framework. Furthermore, we manually annotate a dataset for Named Entity Recogni-tion and Relation Extraction in Chinese legal domain, which contributes to training super-vised triplet extraction models and evaluating the model performance. Our experimental re-sults show that the legal feature introduction and multi-task learning framework are feasible and effective for the Legal Triplet Extraction System. The F1 score of triplet extraction finally reaches 0.836 on the legal dataset.
中文简介:文章针对司法领域涉毒类案件刑事判决书案件事实描述部分,提出了一个实体关系提取模型,该模型基于联合学习的方法,引入多任务学习框架和毒品名称特征,在构建的涉毒类案件实体关系数据集上达到了83.6%的F值。