Font Size: a A A

Research On Evaluating Text Sentiment Analysis Toward People

Posted on:2017-02-19Degree:DoctorType:Dissertation
Country:ChinaCandidate:X X ZhuFull Text:PDF
GTID:1108330488461991Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
In general, text sentiment analysis focuses on analyzing and summarizing the opinion and sentiment in subjective text through some techniques or methods, such as annotation, classification, clustering, and extraction etc. The current researches of text sentiment analysis mainly involve the following: building sentiment text corpus, subjective classification, opinion orientation judgment, opinion target and word extraction and opinion summarization, etc. With the popularity of mobile Internet applications, public opinion analysis, product evaluation and analysis will play a more extensive and important role, in fact, they are all based on text sentiment analysis.Despite the progresses of text sentiment analysis has been developed rapidly over the last years. There still exist many problems that need to be further identified and some current methods need to be improved. For example, the researches on evaluating text toward people have been paying less attention. Compared to evaluate text toward production, evaluating the text toward people has more extreme emotional intensity and more variable expression.In this dissertation, we use machine learning and data mining methods to research the sentiment analysis on evaluating text toward people. Herein, we focus on some key issues in the evaluation of text toward people. The contributions of this dissertation are as follows: 1. A scheme for constructing corpus of evaluation text toward people is designed, which is based on the multi classifier fusion method and active learning method. The traditional methods of manual tagging corpus have some drawbacks, such as time-consuming and high cost. Therefore, machine learning is applied to build corpus in order to reduce the cost. In this dissertation, an annotating method of multiple classifier fusion was achieved. A corpus of evaluating text toward people, which has annotated positive or negative property is constructed on account of a few of annotated text. Moreover, considering dirty words are widely existed in the web text toward people, a novel method for constructing high quality corpus based on active learning is designed. According to this method, a high quality corpus of dirty words text is further built. Utilizing this corpus, highly accurate identification toward dirty text is achieved, which is advantageous to improve the accuracy and recall of automatic identifying negative text and help filter dirty text accurately for social website. 2. Proposing a two-layer-architecture approach for the classification of people using knowledge base and information retrieval system.Sentiment analysis is strongly dependent on the specific domains. For example, the text sentiment analysis model for the evaluation of cars can’t directly be applied to the movie. Words and sentences of evaluating text toward different kinds of people are highly different. If the person’s category can be accurately determined before text sentiment analysis, the following text analysis will become easier.A two-layer-architecture approach is proposed for the classification of people using knowledge base and information retrieval system aiming to the aforementioned problems. Furthermore, an effective news selection algorithm based on LDA is designed to abandon noise news and uncertain news. Thus, only relevant news is added to the training set. The effectively and accuracy of this method are further demonstrated by experiments. 3. Presenting an effective method to extract the opinion target and word based on the maximal weight matching in a bipartite graph with edge weights. Extracting opinion target and opinion word are important for fine-grained sentiment analysis. However, to the best of our knowledge, opinion target and opinion word are generally extracted respectively in most of the studies. After being extracted, their relationship is further established by some methods. However, the relations of modification and restriction between opinion target and opinion word are not being fully considered to date. In this dissertation, a method of extracting opinion target and opinion word on the basis of bipartite graph is designed to combine opinion targets with opinion words as two vertex sets of bipartite graph. The relations between opinion targets and opinion words are calculated subtly, which are regarded as weighting values of the edges of bipartite graph. Moreover, the maximal weight matching in the bipartite graph is calculated using classic Hungarian algorithm and Kuhn-Munkres algorithm. Finally, opinion target and opinion word pairs are found through filtering method. Sentential PMI is designed to calculate the co-occurrence of two phrases. The calculated sentential PMI values can more subtly represent the relationship between the opinion target and word through combing the phrases and their POS together, and using co-occurrence in a sentence instead of co-occurrence in a document, in comparison with traditional PMI.. Moreover, sentential PMI values are rewarded or punished according to their distance in the sentence.Finally, some experiments are completed successfully, which are based on the technologies mentioned. Mainly opinion targets and opinion words of different kind of people are excavated. Opinion targets of positive and negative evaluation text are summarized.In conclusion, some key issues of evaluating text sentiment analysis toward people are studied deeply. First, corpuses of people evaluation and dirty word are constructed. A novel method is designed and implemented for the classification people using knowledge base and information retrieval system. Furthermore, an effective and systematic method to extract the opinion target and opinion word on the basis of bipartite graph and sentential PMI is designed. We believe that our research would offer important reference value to the researches of product evaluation and public opinion analysis.
Keywords/Search Tags:Chinese text sentiment analysis, evaluating text toward people, corpus, character of people classification, opinion target, opinion word, bipartite graph
PDF Full Text Request
Related items