Font Size: a A A

Study On Feature Word Extraction And Semantic Orientation Analysis In Chinese Opinion Mining

Posted on:2011-05-29Degree:MasterType:Thesis
Country:ChinaCandidate:C Q LiFull Text:PDF
GTID:2178360308458752Subject:Computer software and theory
Abstract/Summary:PDF Full Text Request
With the rapid development and popularization of the Internet, the network has become an important platform for the consumer product review publication. For product manufacturers and consumers faced such a complicated network of the comments on the information, how to access the general sentiment of these comments quickly and effectively tend to (positive or negative) has become a new problem. Opinion mining technologies, precisely to address this problem. It combines information retrieval, information extraction, text classification, machine learning, natural language processing, ontology and other technology. It has a certain ability to understand the text, tending to have more artificial intelligence.Opinion Mining includes four tasks: Topic Extraction, Holder Identification, Claim Delimition and Sentiment Analysis. Topic Extraction and Sentiment Analysis is the foundation and focus. This paper uses the specific areas of the Chinese product reviews (mobile digital type) as a research corpus, and focused on the the first and fourth task of Opinion Mining, that is, Topic Extraction and Sentiment Analysis.This paper did include such two problems: 1)how to identify and extract the feature words and polarity words from product reviews; 2)how to identify the sentiment of the polarity words. For the first question, we try to find the model based on Chinese Syntax Pattern(CSP) method. The method uses the Chinese linguistic research, the training data set using a combination of statistical methods, to find some adjectives most commonly used model that the statement syntax, in order to complete the Chinese opinion mining and polarity in the keywords extraction task, through experiments and comparison tests, the method has amount to good results. For the second question, using the traditional search-engine-based SO-PMI method to compute the polarity, and use of statistical theory based on support vector machines to do comparative assessments. As the SO-PMI Algorithm NEAR operator requires further study, instead of using SVM method than the SO-PMI method is better. However, for point of practical application, SO-PMI is easier, and SVM needs a large number of training data. In addition, this paper constructed a mining system of the comments initially: Digi-OMS. The system includes a keyword, polarity, and polarity identification module extraction module. This paper also constructed a dictionary for the system polarity,for the word for the extraction of polar and polarity classification has an important role. Combined with the negative polarity of the dictionary words and adverbs of degree collection set, the paper also proposed a method of calculation for sentence polarity. Digi-OMS system focus on specific areas of Chinese comment, conducted a more comprehensive scientific experiments and have verified the proposed method is more scientific and effective. Overall, the effect of automated analysis results is not bad.The main contribution of this paper include: 1) advanced a new approach to be proposed for solving Chinese opinion mining areas of feature word extraction problems; 2) with the polarity of the polarity of the word classification problem, we used two methods and contrast them; 3) initially built an Opinion Mining system based on the views of the domain; 4) The proposed a method of constructing a polarity dictionary and analysising sentence sentiment analysis.
Keywords/Search Tags:Opinion Mining, Feature Words Extraction, Polarity Words Extraction, Semantic Orientation Analysis
PDF Full Text Request
Related items