费鹏 Semantic Representations
新闻来源:IR实验室       发布时间:2016/4/8 9:58:07

Semantic Representations

Learning Semantic Representation of Users and Products for Document Level Sentiment Classification                         

                                                     Duyu Tang

                                                        ACL, 2015

Motivation

blob.png



l 
Users 

      A critical user might write a review it works great and mark 4 starts

      While a lenient user might give 5 stars even if he posts an identical review

l  Products

      Product quality also has an impact on review sentiment rating.

      Reviews towards high-quality products(e.g. Macbook) tend to receive higher ratings than those towards low-quality products.

Assumption Verification

l  User-sentiment consistency

      Sentiment ratings from the same user are more consistent

l  Product-sentiment consistency

      Ratings towards the same product are more consistent

l  User-text consistency

      A user likes to use personalized sentiment words when expressing opinion polarity or intensity

l  Product-text consistency

      A product also has a collection of product-specific words suited to evaluate it.

Model

l  User Product Neural Network (UPNN)

l        : the i-th word 

l        : user-sentiment ,  : product-sentiment    (vector)

l        : user-text ,  : product-text    (matrix)

l  Semantics of Document

l  Word  à  sentence  à  document

l  Word2vec & Tang et al, 2014

l  CNN

l    

l     

l  Average pooling layer

l  Semantics of Users and Products

      User-sentiment & product-sentiment

                                ,

      Map users with similar rating preferences into close vectors in user embedding space

      User-text & product-text

                           ,

      Composition in distributional models of semantics

    a matrix to modify  another component

Sentiment classification

l  Softmax

        Concatenation :

       

Datasets

l  Use MAE and RMSE to measure the divergences between predicted sentiment ratings(pr) and ground truth ratings(gd)

Baseline methods

l  Majority

       assign the majority sentiment category in training set

l  Trigram

      Unigrams, bigrams, trigrams + SVM

l  TextFeature

      Hand-crafted features: word/character ngrams, lexicon

l  word2vec

       average word embeddings + SVM

l  SSWE

       sentiment-specific word embeddings + SVM

l  RNTN+RNN

       represent sentence with RNTN

l  Paragraph Vector

       PVDM (its codes are not officially provided)

l  JMARS

       topic modeling

Results

blob.png

l  Tailored word embedding from each corpus performs slightly better than general word embeddings

l  SSWE performs better than context-based word embedding by incorporating sentiment information of texts

l  Full model UPNN yields the best performance on all three dataset

Model analysis

l  Remove vector or matrix

      Vector based representations are more effective

    Because user-sentiment and product-sentiment are more directly associated with sentiment labels

    Parameters of vector representations are less than matrix

Discussion

l  Out-Of-Vocabulary Users and products

      If a user or a product in testing/decoding process is never seen in training data

      avg UP : averaged representations

      unk UP : low-frequency users