近日,实验室博士生张晓堃关于多模态会话推荐的研究成果被数据挖掘顶级期刊IEEE Transaction on Knowledge and Data Engineering (TKDE)录用。TKDE是数据挖掘领域最权威的国际学术期刊之一,属于中国计算机学会CCF A类期刊,中科院一区期刊。
题目: Beyond Co-occurrence: Multi-modal Session-based Recommendation
摘要: Session-based recommendation is devoted to characterizing preferences of anonymous users based on short sessions. Existing methods mostly focus on mining limited item co-occurrence patterns exposed by item ID within sessions, while ignoring what attracts users to engage with certain items is rich multi-modal information displayed on pages. Generally, the multi-modal information can be classified into two categories: descriptive information (e.g., item images and description text) and numerical information (e.g., price). In this paper, we aim to improve session-based recommendation by modeling the above multi-modal information holistically. There are mainly three issues to reveal user intent from multi-modal information: (1) How to extract relevant semantics from heterogeneous descriptive information with different noise? (2) How to fuse these heterogeneous descriptive information to comprehensively infer user interests? (3) How to handle probabilistic influence of numerical information on user behaviors? To solve above issues, we propose a novel multi-modal session-based recommendation (MMSBR) that models both descriptive and numerical information under a unified framework. Specifically, a pseudo-modality contrastive learning is devised to enhance the representation learning of descriptive information. Afterwards, a hierarchical pivot transformer is presented to fuse heterogeneous descriptive information. Moreover, we represent numerical information with Gaussian distribution and design a Wasserstein self-attention to handle the probabilistic influence mode. Extensive experiments on three real-world datasets demonstrate the effectiveness of the proposed MMSBR. Further analysis also proves that our MMSBR can alleviate the cold-start problem in SBR effectively.
会话推荐旨在根据匿名用户短期行为序列预测用户偏好,进而为其提供个性化推荐服务。现有的会话推荐方法侧重于挖掘会话中由商品ID暴露的商品共现模式,而忽略了真正吸引用户与特定商品产生交互行为的是页面上展示的关于商品的多模态信息。这些多模态信息可以分为描述型信息(商品的图片和文本),数值型信息(商品的价格)。因此,本文提出一个多模态会话推荐模型(MMSBR)对这些多模态信息进行统一建模,以便更好地理解用户意图,提升会话推荐的性能。具体而言,本文设计了一种伪模态对比学习来提升对商品图片和文本的表示学习;并提出了一种层次枢轴transformer来融合异构的图片和文本信息,进而表示商品的描述性特征。此外,我们用高斯分布表示数值信息,并设计了Wasserstein自注意来处理价格对用户的概率影响模式。大量实验证明了所提出的MMSBR的有效性。进一步的分析也证明了MMSBR可以有效地缓解会话推荐中的冷启动问题。