Font Size: a A A

Sentiment Analysis Based On Wikipedia Articles

Posted on:2019-05-28Degree:MasterType:Thesis
Country:ChinaCandidate:Q H YeFull Text:PDF
GTID:2428330572498247Subject:Computer technology
Abstract/Summary:PDF Full Text Request
Wikipedia is the most widely used online encyclopedia in the world,and it states that the fundamental principle behind its articles is the"Neutral point of view"(NPOV).But some researchers found that Wikipedia articles contain sentimental expressions.However,the previous researches only focus on articles with specific topic,which means a very limited coverage,and there is few effective approach to visualize the sentiment distribution over the entire Wikipedia corpus.Therefore,we did sentiment analysis for all the Wikipedia articles we could get in the English language,and proposed an interactive visualization system WikiSentiViewer with the assistance of interactive mechanism from IPython and visualization technology from Folium.The main contributions are as follows:(1)Sentiment analysis on Wikipedia articles based on bag-of-words model.First,we looked into the structure of a Wikipedia article page and extracted article content with the help of Wikipedia resource and parser tool WikiExtractor.After that,we tokenized each of the articles by natural language processing techniques,and calculated the sentiment score for each article with LIWC,OL,MPQA three different sentiment lexicons,which includes positive sentiment score,negative sentiment score,and the total score.At the end,we analyzed the frequency of articles across different score ranges.(2)Designed and implemented the visualization framework WikiSentiViewer for the sentiment distribution.First,we extracted information from DBPedia which includes category,time,and geographical coordinates for each article,and processed geographical data with GeoPandas library to classify the articles into different countries.Then,we designed the system interface and implemented the interactive function with ipywidgets library.Finally,we constructed the geographical sentiment distribution plot with Folium library,to complete the whole visualization system.This system allows users to set sentiment lexicon being used and the articles being displayed with certain attributes(such as category,area,time,sentiment score range,et al.),and generates corresponding geographical distribution plot for sentiment.The plot displays each article as a circular marker at the corresponding location on the map,and indicates the sentiment score of the articles,with varying size and color of markers.(3)Analyzed and compared the sentiment distribution of Wikipedia articles from three different points of view based on geography,time,and sentiment lexicon.To begin with,we first analyzed the geographical distribution of sentiment for Wikipedia articles with WikiSentiViewer.Secondly,we analyzed the temporal distribution of sentiment by plotting the changes of sentiment score over time respectively for positive sentiment and negative sentiment.Finally,we compared the different sentiment score distributions calculated with three sentiment lexicons to see the differences between lexicons.The results show that WikiSentiViewer can be used to visualize the sentiment distribution according to the geographical location in an effective way.From the perspective of time,the positive sentiment score of articles about people and events has increased slightly over time.The negative sentiment score remains almost the same for articles about people,while it has decreased gradually over time for articles about events.From the perspective of different sentiment lexicon,the trends of sentiment distributions obtained by LIWC,OL and MPQA three different lexicons are basically the same,but the values are different.The sentiment score calculated with LIWC lexicon is mainly lower than the score calculated with OL lexicon,and both of them are mainly lower than the score calculated with MPQA lexicon.
Keywords/Search Tags:Wikipedia, DBPedia, sentiment analysis, sentiment lexicon, visualization technology
PDF Full Text Request
Related items