Font Size: a A A

Based On The Characteristics Of Some Key Problems In View Of Mining Research

Posted on:2012-12-15Degree:MasterType:Thesis
Country:ChinaCandidate:P C QiuFull Text:PDF
GTID:2248330371465524Subject:Computer technology
Abstract/Summary:PDF Full Text Request
The Internet has dramatically changed the way people express their views. People can not only write a review for a commodity on the E-commerce site, but also express their views on anything in forums, discussion groups, blog and other online media services. For one popular product, there may be thousands of comments, which lead to a great deal of inconvenience for the potential consumers to read. How to automatically extract representative views in a smart way for the consumer decision-making has become a hot research direction in recent years.The process of opinion mining involves feature extraction, opinion extraction, opinion orientation discrimination, and other issues. To solve these problems, scholars have proposed a lot of solutions. In feature extraction, the mainstream idea is to use high-frequency nouns (or noun phrase) as candidate features, but this method cannot effectively extract the non-frequency feature that is representative; in opinion orientation discrimination, the existing methods rely mainly lexicon such as WordNet. However, methods based on lexicons cannot determine the orientation for all opinion words, which leads to great limitations, and further, there are no Chinese lexicons freely available.To solve these problems, this paper has done the following work. Firstly, in order to effectively extract high-frequency feature and representative feature with low-frequency, we implemented a feature extraction method by combining frequency and TFIDF values. Secondly, to discriminate the opinion orientation, we propose several methods making use of scoring information provided by users, which can overcome the defects caused by the methods using lexicons.Finally, this paper introduces our prototype system for opinion mining, which consists of the following modules:data crawling and preprocessing, feature and opinion extraction, opinion orientation discrimination and summarization.
Keywords/Search Tags:Opinion Mining, Feature Extraction, Opinion Orientation Discrimination, Machine Learning
PDF Full Text Request
Related items