近日,实验室博士生闵昶榮关于仇恨言论检测的研究成果被计算机领域顶级期刊Information Fusion录用。该期刊属于中科院一区期刊,影响因子17.564。
题目: Finding hate speech with auxiliary emotion detection from self-training multi-label learning perspective
摘要:Hate Speech Detection (HSD) aims to identify whether a text contains hate speech content, which often refers to discrimination and is even associated with a hate crime. The mainstream methods jointly train the HSD problem with relevant auxiliary problems, e.g., emotion detection and sentiment analysis, under the paradigm of Multi-Task Learning (MTL). In this paper, we improve HSD by integrating it with emotion detection, since we take inspiration from the potential correlations between hate speech and certain negative emotion states, which have been studied theoretically and empirically. To be specific, we can concatenate their hateful labels and predicted emotion states as pseudo-multiple labels for hate speech samples, formulating a pseudo-Multi-Label Learning (MLL) problem. Beyond the existing MTL-HSD methods, we further incorporate this pseudo-MLL problem and solve it by capturing the correlations between hate speech and negative emotion states, so as to improve the performance of HSD. Based on these ideas, we propose a novel HSD method named the Emotion-correlated Hate Speech DetectOR (EHSor). We conduct extensive experiments to evaluate EHSor, and the results show that it can consistently outperform the existing HSD methods across benchmark datasets.
仇恨言论检测旨在确定文本是否包含仇恨言论内容,这通常涉及歧视,并与仇恨犯罪相关。主流方法在多任务学习 (MTL) 范式下同时训练仇恨言论检测任务以及辅助任务,例如情绪检测和情感分析。本文通过将 仇恨言论检测任务与情绪检测集成,并从仇恨言论可能与某些负面情绪状态之间的潜在相关性研究中获取灵感,来改进仇恨言论检测任务。具体而言,本文将仇恨标签和预测的情绪状态作为仇恨言论样本的伪多标签,转化为一个多标签学习问题。在现有的基于多任务的仇恨言论检测方法基础上,进一步将此伪多标签学习思想融入其中,通过捕捉仇恨言论与负面情绪状态的相关性来改善任务的表现。基于上述想法,本文提出了一种名为情绪相关仇恨言论检测器 (EHSOR) 的方法。实验结果表明,该方法在基准数据集上能够持续优于现有的基线方法。