近日,人工智能国际顶级会议(IJCAI 2024)公布了论文录用结果,博士生逯志兴关于参数向量正交性的研究成果被录取为长文,IJCAI 是人工智能顶级会议,被CCF推荐为A类国际学术会议。
题目:The Orthogonality of Weight Vectors: The Key Characteristics of Normalization and Residual Connections
Abstract:Normalization and residual connections find extensive application within the intricate architecture of deep neural networks, contributing significantly to their heightened performance. Nevertheless, the precise factors responsible for this elevated performance have remained elusive. Our theoretical investigations have unveiled a noteworthy revelation: the utilization of normalization and residual connections results in an enhancement of the orthogonality within the weight vectors of deep neural networks. This, in turn, induces the Gram matrix of neural network weights to exhibit a pronounced tendency towards strict diagonal dominance, thereby amplifying the neural network's capacity for feature learning. Meanwhile, we have designed the parameters independence index (PII) to precisely characterize the orthogonality of parameter vectors. In tandem with our theoretical findings, we undertook empirical validations through experiments conducted on prevalent network models, including fully connected networks (FNNs), convolutional neural networks (CNNs), Transformers, pre-trained language models (PLMs) and large language models (LLMs) composed of Transformers. Finally, we have found that a fine-tuning technique (LoRA) preserves the orthogonality of parameter vectors, a revelation that carries importance within the framework of fine-tuning techniques for LLMs.
中文摘要:归一化和残差连接广泛应用于深度神经网络中,显著提高了网络的性能,然而,导致这种提升性能的本质原因仍然没有被发现。我们的理论研究揭示了一个值得注意的发现:归一化和残差连接会增强深度神经网络中权重向量的正交性,会导致神经网络权重的 Gram 矩阵趋于对角矩阵,从而增强了神经网络特征学习的能力。与此同时,我们设计了参数独立性指数(PII)来精确刻画参数向量的正交性。结合我们的理论发现,我们对常见网络模型进行了实验验证,其中包括全连接网络(FNNs)、卷积神经网络(CNNs)、Transformer、以及由Transformer组成的预训练模型(PLMs)和大语言模型(LLMs)。进一步,我们发现LoRA微调技术也保持了参数向量的正交性,这是有关LLM微调技术比较重要的发现。