>>最受欢迎的情感词典,欢迎点击下载!<<
研究方向
学术报告
资源下载
当前位置: 首页>>新闻动态>>正文
    博士生曾景杰关于多模态大语言模型的研究成果被ACM MM2024录用
    2024-07-29 18:02 卢俊宇 

    近日,多媒体领域顶级国际会议ACM MM 2024公布了论文录用结果,博士生曾景杰关于多模态大语言模型的研究成果被ACM MM2024录用为Oral174/43853.97%),ACM MM CCF推荐为A类国际学术会议。



    Abstract:

    By integrating various modules with the Visual Transformer (ViT), we facilitate a interpretation of image processing across each layer and attention head. This method allows us to explore the connections both within and across the layers, enabling a analysis of how images are processed at different layers. Conducting a analysis of the contributions from each layer and attention head, shedding light on the intricate interactions and functionalities within the model's layers. This in-depth exploration not only highlights the visual cues between layers but also examines their capacity to navigate the transition from abstract concepts to tangible objects. It unveils the model's mechanism to building an understanding of images, providing a strategy for adjusting attention heads between layers, thus enabling targeted pruning and enhancement of performance for specific tasks. Our research indicates that achieving a scalable understanding of transformer models is within reach, offering ways for the refinement and enhancement of such models.


    中文摘要:

    通过将各种模块与视觉 TransformerViT)相集成,我们促进了对每一层和注意力头的图像处理的解释。这种方法使我们能够探索层内和层间的联系,从而能够分析图像在不同层是如何被处理的。对每一层和注意力头的贡献进行分析,揭示了模型层内复杂的相互作用和功能。这种深入的探索不仅突出了层之间的视觉线索,还检验了它们从抽象到具体的过渡能力。它揭示了模型对图像理解的机制,为调整层间的注意力头提供了策略,从而能够针对特定任务进行有针对性的剪枝和增强。我们的研究表明,实现对 ViT 模型的可解释性是可行的,为这类模型的改进和增强提供了途径。


    关闭窗口