This research investigates the use of sentiment analysis technologies to extract affective states of users from text. A dataset of text annotated on affect by human raters was needed to test the affect detection algorithms. No such dataset was available, therefore data from the e-community Greenwire was annotated on affect which gave us insights into the agreement between human raters for affect annotation. Affect detection algorithms were made by automatically labeling Twitter data using emoticons. The affect detection algorithms were tested on various datasets including the annotated Greenwire dataset.
In e-communities users can interact without the limits of spatial presence and established social relationships. Greenwire is the e-community by and for Greenpeace International (2012) where volunteers and employees of Greenpeace communicate, interact, share information, and discuss topics. Users of Greenwire can form and join groups; create and invite people for events; give shouts on various locations (homepage, group pages, and user pages); and post blogs, ideas, news, and photos.
Greenwire is an example of how offline communities, that are connected with large institutions, are transforming into hybrid communities were members interact both offline and online. The move to hybrid communities is gaining ground with everyday institutions. For instance, it seems that in the next decade’s health and education institutions are transforming into hybrid communities where most interaction will take place online (Moore and Kearsley, 2011, Hardiker and Grant, 2011).
The growth in e-communities is a motive to study online interaction in these environments from a natural language processing (NLP) perspective. Studying online interaction with NLP is a challenging task because most interactions will likely be rich with emotional and subjective content. NLP, like most computer science fields, has had great success with applications that process facts and logic expressed in natural language, but most of the challenges which remain deal with human factors.
Communication of the applications of affect detection for e-communities could encourage NLP research for automatic detecting of affective states of users which could lead to better methods and resources. Following is a list of novel applications related to the task of affect detection of users in an e-community:
- Detecting users whose interactions lead to positive feelings from other users and give them more responsibilities and rights.
- Detecting positive groups within the community which could be a reason for the institution to give support to these groups.
- Detecting users who intentionally try to discourage or disrupt activity in the community, called flamers or trolls.
- Recognizing interactions in the e-community which often lead to negative emotions and design or moderate them differently.
- Detecting groups with many negative users and moderate them more closely.
- Getting insights in the health of the e-community by watching how users feel over a time period.
- Giving moderators of an e-community search functionality to find content based on values of affect.
These applications could improve the monitoring of user interactions in e-communities and contribute to keeping the community healthy and active. However, to develop and evaluate these applications, technologies are necessary which can detect the affect of users in an e-community.
Sentiment analysis is a popular research field that focuses on detecting subjective text. Much effort has gone into sentiment analysis for opinion rich text, such as in reviews or blogs Pang and Lee (2008). The reason is that there are many commercial applications for detecting opinions. For example, an application is finding improvements for products from online consumer reviews by automatically detecting the polarity of attributes of the products. It seems that opinions and reviews are closely related to affect and the technologies developed and tested for detecting sentiment could be used to detect affect.
1.1 State of the art for applying affect detection on e-communities
Chmiel et al. (2011a) applied sentiment analysis technologies to detect the polarity of discussions in three public domains. The first domain is the BBC fora in which results showed that when a discussion started with emotionally rich content the discussion would be longer. This effect was especially strong for negative messages posted on the BBC fora (Chmiel et al., 2011b).
Thelwall, Buckley, and Paltoglou (2012) evaluated an unsupervised lexicon-based approach to detect polarity in the social sites Twitter, MySpace, YouTube, and more. Evidence from Thelwall et al. (2012) suggests that a lexical approach can give a good performance for polarity classification over multiple social network domains. Thelwall et al. argue that words related to topics can be learned as indicators of sentiment when training an algorithm, for example “Iraq” and “Israel” will likely be learned as indicators of negative sentiment when trained on news data. It seems that when a supervised algorithm is trained with a high amount of domain-independent data it would not learn to use these words as strong indicators of sentiment.
Fan, Zhao, Chen, and Xu (2013) categorized emotions from the social network Weibo into “anger”, “joy”, “sadness” and “disgust”. The results from Fan et al. (2013) indicate that the anger emotion contaminated more than joy among users, and sadness and disgust emotions did not show much contamination. The idea behind emotion contagion is that when people communicate they will be influenced by another’s person’s emotions, an example is that when “talking to a depressed person we may feel depressed” (Hatfield and Cacioppo, 1994). Emotion contagion has been widely researched in psychology literature and it seems that emotions can contaminate in e-communities. A recent study shows that emotions contaminate on Facebook when being exposed to other emotions on the news feeds without direct user interaction (Kramer, Guillory, and Hancock, 2014).
De Choudhury, Counts, and Gamon (2012) analyzed the polarity and arousal dimension in social networks by deriving 200 words related to the mood of a user (e.g. “excited”, “mad”). These words were used to label posts from Twitter that were labeled with a hashtag with a mood word. De Choudhury, Counts, and Gamon evaluated if the tweets labeled with the hashtags captured the mood of the user by using Amazons Mechanical Turk. The results show that 83% of the tweets contained the mood predicted by the hashtags. Furthermore, De Choudhury et al. found that users that were socially active were posting more positive tweets.
1.2 Objectives and research questions
Applying sentiment analysis to e-communities is mostly applied to material that is posted publicly. For example, discussion boards (Chmiel et al., 2011a), social networks (Fan et al., 2013, Pak and Paroubek, 2010) and blogs Chmiel et al. (2011a). Obtaining private data for research purposes has some benefits. First, people are likely to interact differently in private interactions than public interactions due to privacy and trust issues. To my knowledge there has not been a study that applied sentiment analysis to data of a private e-community in where the main interest of members is communicating and not publishing their opinions. The focus of this research is on such a community.
The aim is not to improve the state of the art sentiment analysis technologies. Nor to build and evaluate the applications mentioned for e-communities. Instead, we wish to take a step towards developing the applications by researching technologies which can detect the affective state of users in e-communities by validating the following hypotheses
in table 1.1.
Russell (1980) states that affect can be seen as a “circumplex” model consisting of a valence (between positive and negative) and activity (between low arousal and high arousal) dimension, also called core affect. These dimensions are referred to as the polarity and arousal dimension. There is much evidence from neurophysiology studies that we experience feelings and emotions related to these dimensions in a continuous affective state Russell (2003). On the other hand there is little evidence that the human brain is wired in discrete emotions (e.g. anger and sadness). Therefore, in this research the focus is on detecting this affective state of users by using the polarity and arousal dimensions instead of discrete emotional outbursts.
|H1||We can automatically detect core affect of users in the Greenwire community by applying computational techniques from sentiment analysis.|
|H2||Twitter data can be used as domain-independent data for training algorithms for e-communities purposes.|
|H3||Features from sentiment analysis give a similar performance when detecting core affect as when detecting opinions.|
|H4||Core affect of users in the Greenwire community can be labeled accurately by human annotators.|
The approach taken to validate H4 was to annotate content from the e-community of Greenpeace (Greenwire) on affect. The Greenwire dataset and other datasets were used to test the affect detection algorithms, these datasets are discussed in chapter 3. The annotation process was conducted in two stages, which are reported in chapter 4. The first stage was to conduct a pilot study where 10 people annotated the same content in the Greenwire dataset for 15 minutes. The next stage was to alter the procedure and annotate the full dataset with 3 people. The resulting annotated Greenwire dataset was used to test the affect detection algorithms.
An automatically labeling approach was taken to create training datasets for the polarity and arousal detecting algorithms. Emoticons were used to label Twitter data on polarity and arousal values. In chapter 5 the methodology is discussed and in chapter 6 the implementation is reported. The affect detection algorithms are tested on various dataset and the results are reported in chapter 7. Furthermore, chapter 2 contains background information and chapter 8 gives the conclusions.
The end of this chapter please see other chapters.