近日,自然语言处理顶级会议(COLING 2022)公布了录用论文列表,实验室徐博老师和博士生宁金忠的两项研究成果被录取为长文。国际计算语言学大会 (International Conference on Computational Linguistics,COLING),是自然语言处理和计算语言学领域的重要国际学术会议,每两年召开一次,其为CCF推荐会议B类。
题目: RealMedDial: A Real Telemedical Dialogue Dataset Collected from Online Chinese Short-Video Clips(基于医疗问诊短视频的真实场景医疗对话研究)
作者:徐博 等
摘要:Intelligent medical services have attracted great research interests for providing automated medical consultation. However, the lack of corpora becomes a main obstacle to related research, particularly data from real scenarios. In this paper, we construct RealMedDial, a Chinese medical dialogue dataset based on real medical consultation. RealMedDial contains 2,637 medical dialogue and 24,255 utterances obtained from Chinese short-video clips of real medical consultations. We collected and annotated a wide range of meta-data with respect to medical dialogue including doctor profiles, hospital departments, diseases and symptoms for fine-grained analysis on language usage pattern and clinical diagnosis. We evaluate the performance of medical response generation, department routing and doctor recommendation on RealMedDial. Results show that RealMedDial are applicable to a wide range of NLP tasks with respect to medical dialogue.
近年来,自助式智能医疗对话服务备受关注,受限于真实场景下医疗对话语料库的缺失,相关研究亟待深入开展。本文构建了一个基于真实医疗问诊数据的中文医疗对话语料库RealMedDial,该语料库包含2,637条医疗对话和24,255条话语,语料库中的真实医疗咨询来源于中文短视频快手平台,除医疗对话内容,我们还采集并标注了医疗对话元数据,包括医生信息、医院科室、疾病和症状等,便于对医疗语言使用模式和临床诊断预判做出细粒度分析。我们分别在医疗答复生成、部门路由和医生推荐等任务上验证了该语料库的可用性,为相关医疗自然语言处理研究提供了有效的数据支撑和研究基础。
题目: Two Languages Are Better Than One: Bilingual Enhancement For Chinese Named Entity Recognition (双语言胜过单语言:基于双语增强的中文命名实体识别)
作者:宁金忠 等
摘要:Chinese Named Entity Recognition (NER) has continued to attract research attention. However, most existing studies only explore the internal features of the Chinese language but neglect other lingual modal features. Actually, as another modal knowledge of the Chinese language, English contains rich prompts about entities that can potentially be applied to improve the performance of Chinese NER. Therefore, in this study, we explore the bilingual enhancement for Chinese NER and propose a unified bilingual interaction module called the Adapted Cross-Transformers with Global Sparse Attention (ACT-S) to capture the interaction of bilingual information. We utilize a model built upon several different ACT-Ss to integrate the rich English information into the Chinese representation. Moreover, our model can learn the interaction of information between bilinguals (inter-features) and the dependency information within Chinese (intra-features). Compared with existing Chinese NER methods, our proposed model can better handle entities with complex structures. The English text that enhances the model is automatically generated by machine translation, avoiding high labour costs. Experimental results on four well-known benchmark datasets demonstrate the effectiveness and robustness of our proposed model.
中文NER任务一直受到广泛的关注。然而,已有的方法只利用了中文模态内部特征而忽略了其他语言模态的特征。作为中文文本的另外一个模态,其对应的英文文本中包含潜在的能用于提升中文NER性能的提示信息。因此,本研究探索了使用双语信息来增强中文NER任务并且提出了一个统一的双语交互模块ACT-S来同时捕获双语之间的依赖信息和中文模态内部的依存信息。和已有的方法相比,我们提出的模型能更准确地标注出结构复杂的实体。用于增强模型的英文文本均产生自机器翻译工具,没有耗费额外的人工。实验结果表明我们的模型在四个公开数据集上取得了SOTA结果。