Font Size: a A A

Research On Expansion And Optimization Of Sentiment Lexicon Based On Internet

Posted on:2012-11-23Degree:MasterType:Thesis
Country:ChinaCandidate:X ZhouFull Text:PDF
GTID:2268330425997259Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
With the development of the network technology, the Internet has already become an important consulting tool, but when people facing huge amounts of Internet data, how to quickly and accurately mining viewpoint data becomes a challenge. And viewpoint data is very important and valuable to the government, enterprises and individual. It is these challenges and demands that promote the development of sentiment analysis. Sentiment analysis is divided into vocabulary level, sentence level and textual level, and the vocabulary level of sentiment analysis is the base. This paper focuses on the vocabulary level of sentiment analysis, namely, research expansion and optimization of sentiment lexicon Based on Internet background.Firstly, this paper mentions a words’ polarity classification measure for common polar words extraction based on the Internet corpus. This measure turns polar words extraction problem into words’polarity classification problem. By analyzing the characteristics of polar words, we use the maximum entropy classifier to classify the sentiment polarity of words. We define three sentiment polarities of words which are positive, negative and objective respectively and use search engines to obtain the unlabeled corpus of candidate words, and extract rich feature from this big corpus. At last we choose the best combination of features for vocabulary polarity classification through experiments. The results show that choosing previous and next words as features are most effective, and these features includes modified and collocation linguistic phenomenon essentially. Using the best combination of features, the words’polarity classification performance achieved95.9%.Secondly, in expansion and optimization of domain sentiment lexicon aspect, we work on two parts which respectively are extraction of collocation for polar words and polar multi-word terms. In the part of collocation extraction this paper proposes two extraction frames of MI frame and template frame. It includes three extraction methods, which are MI based method, string template based method and parser tree template based method. The results of experiment show that collocation extraction method based on the template frame is more efficient when corpus is small.The final mission is polar multi-word term extraction. In the present study, the basic polar elements usually are words, but when we research the network product comments, we found that people sometimes use a multi-word term instead of single word to express the polarity in comment sentences, so we also take polar multi-word term extraction as a research focus. We adopt parser sub tree to generate multi-word terms, then use C-value to make length selection for candidate multi-word terms. And we propose an exclusive method based on the hypothesis that the polar elements are mutually exclusive in one comment sentence to purify polar multi-word terms. Finally, we judge polarity through network labeled corpus and search engine, and we analysis advantages and disadvantages of each method through experiments.
Keywords/Search Tags:sentiment analysis, sentiment lexicon, automatic expansion, domain expansion, Internet corpus
PDF Full Text Request
Related items