近日,EMNLP 2024公布了录用论文列表,实验室共有三篇长文录用为findings。EMNLP是计算语言学和自然语言处理领域的国际顶级会议,在学术界及业界都享有盛誉,在CCF推荐列表中认定为B类学术会议,在清华计算机学术推荐列表中认定为A类会议。
录用论文一:Breaking the Boundaries: A Unified Framework for Chinese Named Entity Recognition Across Text and Speech
作者:博士生宁金忠 等
摘要:In recent years, with the vast and rapidly increasing amounts of spoken and textual data, Named Entity Recognition (NER) tasks have evolved into three distinct categories, i.e., text-based NER (TNER), Speech NER (SNER) and Multimodal NER (MNER). However, existing approaches typically require designing separate models for each task, overlooking the potential connections between tasks and limiting the versatility of NER methods. To mitigate these limitations, we introduce a new task named Integrated Multimodal NER (IMNER) to break the boundaries between different modal NER tasks, enabling a unified implementation of them. To achieve this, we first design a unified data format for inputs from different modalities. Then, leveraging the pre-trained MM-Speech model as the backbone, we propose an Integrated Multimodal Generation Framework (IMAGE), formulating the Chinese IMNER task as an entity-aware text generation task. Experimental results demonstrate the feasibility of our proposed IMAGE framework in the IMNER task. Our work in integrated multimodal learning in advancing the performance of NER may set up a new direction for future research in the field.
近年来,随着语音和文本数据的快速增长,命名实体识别(NER)任务已发展为三个不同的类别,即基于文本的NER(TNER)、基于语音的NER(SNER)和多模态NER(MNER)。然而,现有方法通常需要为每个任务设计单独的模型,忽视了任务之间的潜在联系,限制了NER方法的在不同模态间的通用性。为了克服这些限制,我们引入了一个名为混合多模态NER(IMNER)的新任务,以打破不同模态NER任务之间的界限,实现统一模态的NER。为此,我们首先为不同模态的输入设计了统一的数据格式。然后,利用预训练的MM-Speech模型作为主干,我们提出了一个混合多模态生成框架(IMAGE),将中文IMNER任务重构为一个以实体为中心的文本生成任务。实验结果表明了我们提出的IMAGE框架在IMNER任务的三个子任务上的中的有效性。
录用论文二:PclGPT: A Large Language Model for Patronizing and Condescending Language Detection
作者:硕士生王宏博 等
摘要:Patronizing and condescending language (PCL) is a form of speech directed at vulnerable groups. As an essential branch of toxic language, this type of language exacerbates conflicts and confrontations among Internet communities and detrimentally impacts disadvantaged groups. Traditional pre-trained language models (PLMs) perform poorly in detecting PCL due to its implicit toxicity traits like hypocrisy and false sympathy. With the rise of large language models (LLMs), we can harness their rich emotional semantics to optimize PCL detection. In this paper, we introduce PclGPT, a comprehensive LLM benchmark designed specifically for PCL. We collect, annotate, and integrate the Pcl-PT/SFT dataset, and then develop a bilingual PclGPT-EN/CN model group through a comprehensive pre-training and supervised fine-tuning staircase process to facilitate cross-language detection. Group detection results and fine-grained detection from PclGPT and other models reveal significant variations in the degree of bias in PCL towards different vulnerable groups, necessitating increased societal attention to protect them.
居高临下语言(PCL)是针对弱势群体的一种言语形式。作为毒性语言的一个重要分支,此类语言加剧了互联网社区之间的矛盾和对抗,对弱势群体产生了不利影响。传统的预训练语言模型(PLM)在检测PCL方面表现不佳,因为它隐含着虚伪、虚假同情等毒性特征。随着大型语言模型(LLM)的兴起,我们可以利用其丰富的情感语义来优化PCL检测。在本文中,我们介绍了PclGPT,一个专门为PCL设计的全面的LLM基准。我们收集、注释和集成Pcl-PT/SFT数据集,然后通过全面的预训练和监督微调阶梯过程开发双语PclGPT-EN/CN模型组以促进跨语言检测。 PclGPT 等模型的群体检测结果和细粒度检测表明,PCL 对不同弱势群体的偏见程度存在显著差异,需要社会加大关注来保护这些群体。
录用论文三:Exploring the Capability of Multimodal LLMs with Yonkoma Manga: The YManga dataset and Its Challenging Tasks
作者:硕士生杨琦 等
摘要:Yonkoma Manga, characterized by its four-panel structure, presents unique challenges due to its rich contextual information and strong sequential features. To address the limitations of current multimodal large language models in understanding this type of data, we create a novel dataset named YManga from the Internet. After filtering out low-quality content, we collect and annotate a dataset of 1,015 Yonkoma strips, and each sample contains 10 annotations. We then define three challenging tasks for this dataset: panel sequence detection, generation of the author's creative intention, and description generation for masked panels. These tasks progressively introduce the complexity of understanding and utilizing such image-text data. To the best of our knowledge, YManga is the first dataset specifically designed for Yonkoma manga strips understanding. Extensive experiments conducted on this dataset reveal significant challenges faced by current multimodal large language models. Our results show a substantial performance gap between models and humans across all three tasks.
四格漫画以四格结构为特征,由于其丰富的上下文信息和强大的序列特征,带来了独特的挑战。为了解决当前多模态大型语言模型在理解此类数据方面的局限性,我们从互联网上创建了一个名为 YManga 的新数据集。在过滤掉低质量内容后,我们收集并注释了一个包含 1,015 个四格漫画的数据集,每个样本包含 10 个注释。然后,我们为该数据集定义了三个具有挑战性的任务:面板序列检测、作者创作意图生成和蒙版面板描述生成。这些任务逐步引入了理解和利用此类图像文本数据的复杂性。据我们所知,YManga 是第一个专门为理解四格漫画而设计的数据集。对该数据集进行的大量实验揭示了当前多模态大型语言模型面临的重大挑战。我们的结果显示,在这三个任务中,模型与人类之间存在相当大的性能差距。