Sentiment Recognition using Lexicons (Jurafsky Martin)
As presented in Jurafsky Martin Chapter 20: Lexicons for Sentiment, Affect, and Connotation
In the simplest case, lexicons can be used when we don’t have sufficient training data to build a supervised sentiment analyzer. In such situations lexicons can be used in a rule-based algorithm. We might, in the simplest case of all, just use the ratio of +ve/-ve words. If the ratio is greater than a threshold \(\lambda\) then the document is classified as +ve.
If supervised training data is available, then we might add scores computed from sentiment lexicons, weighted or normalized in some way, as features in a classifier along with other lexical/non-lexical features.
Generalizing across sentiment and other kinds of affective meaning, if the training set is large enough, simply using all the words or bigrams as features in a powerful classifier like SVM or logistic regression can be hard to beat. We can often treat affective meaning classification as a simple case of document classification.
Sometimes modifications need to be made though. EG if the dataset is very large filtering might be required. One way is to filter based on Pointwise Mutual Information. Various weights can be used, including raw count or normalized probability, or log probability.
If training data is sparser, any of the lexicons may be helpful, alone or in combination with words/n-grams.
To get values for lexicon features we migth take just an indicator function, where the value of feature \(f_L\) is 1 if the text has any word from the relevant lexicon. Or it could be the count of the word tokens from the lexicon that occur in the document. Or if the lexicon has weights, the weights might be used to multiply the counts.
Entity Centric Affect Recognition
Sometimes we might not want to get a score for the whole document, but for a particular entity in the text.
One method combines affect lexicons with contextual embeddings to assign an affect score to an entity in text. It is presented in Field and Tsvetkov, 2019 ->
The algorithm trains classifiers to map embeddings to scores:
For each word w in the training corpus:
(a) Use off-the-shelf pretrained encoders (like BERT) to extract a contextual embedding e for each instance of the word. No additional fine-tuning is done.
(b) Average over the e embeddings of each instance of w to obtain a single embedding vector for one training point w.
(c) Use the NRC VAD Lexicon to get scores for w.
Then train (three) regression models on all words w to predict V, A, D scores from a word’s average embedding.
Now when an entity is mentioned, m, in a text we assign affect scores as follows:
Use the same pretrained LM to get contextual embeddings for m in context.
Feed this embedding through the 3 regression models to get scores for the entity.