Font Size: a A A

Research And Implementation Of Mining Customers' Reviews

Posted on:2010-06-04Degree:MasterType:Thesis
Country:ChinaCandidate:N LiuFull Text:PDF
GTID:2178360272997474Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
At present, with the rapid development of e-commerce, online shopping has become no stranger to many of the people more and more homes will be able to buy the merchandise they want. In order to better service to consumers, as well as to increase the shopping experience to consumers, many merchants for the United website to provide consumers with a platform to comment in this way, consumers will be able to timely comments will be fed back to merchandise, as well as potential business consumers. However, with the comment merchandise exponential growth in class, all reading these comments will help consumers make a decision very difficult, so the merchandise in urgent need of an effective mining method comment.The development of business intelligence for the popularity of e-commerce has laid a solid foundation for business intelligence refers to information stored in a variety of commercial systems data into useful information technology. It allows business users through a database query and analysis, come to influence a key factor in commercial activity, and ultimately make better and more rational decision-making business strategy to enable enterprises in a rapidly changing and competitive market, access to the greatest degree of competitive advantage. One of online analytical processing and data mining tools from different levels to help companies achieve this goal.In this paper, belong to data mining research and application of technology, data mining (Data Mining, DM) refers to the database or from the large-scale data warehouse to extract knowledge of interest, such knowledge is implicit, unknown in advance, potentially useful information. It combines database, artificial intelligence, machine learning, statistics and other fields of theory and technology, research databases are a promising new areas of application value. Data mining tools to be able to deep-level data analysis of future trends and behavior prediction is an important business intelligence component. Mining association rules in data mining has been a hot and priorities. Association rules are a matter and other matters of mutual interdependence and a description of the relationship in general can be divided into two steps:1. To identify all support greater than or equal to minimum support threshold of frequent itemsets;2. From generation to meet the credibility of frequent pattern of association rules threshold.Customer reviews mining as an emerging area of research, information and technology is still very imperfect, this article will be English word technology, text mining and association rule mining technology. This article first text mining in the introduction to comment on the application point of view, the detail of the current text mining technology, introduction of the process of text classification, namely: the text to quantify the training classifier, classifier, and classification of test results evaluation. EASY detailed Vector distance classifier, KNN classification methods, Naive Bayesian classification methods and support vector machine classification of the four text classification technology, its advantages and disadvantages are compared.Text Mining at the hands of the related concepts and technologies, this article focuses on the association rule mining technology and customer evaluation of the implementation algorithm. Mining for association rules, the implementation of the Apriori algorithm and FP-Growth algorithm, two algorithms in the implementation process, the input of the Apriori algorithm to optimize the process, through the experimental comparison, significantly improve the operating efficiency of the Apriori algorithm. And optimized before the Apriori algorithm and implementation of the FP-Growth algorithm at run-time, accurate coverage rates and a detailed comparison of the results by experiment better selection of FP-Growth algorithm for mining algorithm as a customer comment on a module.Another focus of this paper is to evaluate the client implementation of the mining algorithm is also the ultimate goal of this article. To evaluate the implementation of mining clients, first of all, customers want to split the original comments, that is, want to comment on the original sentence for word segmentation. The existing segmentation techniques for most of them in English, while the English word is still in rapid development. In this paper, the Chinese Academy of Sciences Chinese Lexical Analysis System ICTCLAS as the Chinese word segmentation algorithm module. Stand-alone word ICTCLAS speed 996KB / s, sub-word accuracy of 98.45%, API does not exceed 200KB, all kinds of dictionary data compression less than 3M, is currently the world's best Chinese lexical analyzer. From the specific comments in this article to see the demo also ICTCLAS very well be able to finish the task of Chinese word segmentation.After Chinese word segmentation implementation, the implementation of this study and evaluation of customer mining algorithm three important steps: frequent feature recognition, extraction and the polarity of opinion words and comments to identify the polarity of sentence recognition. Frequent feature recognition is to find products on the sentence merchandise consumers are most concerned about property, in this step, this paper, the FP-Growth algorithm, the characteristics of the term are probably may also have a number of terms are composed of phrases forms. This article defines the characteristics of here a maximum of three terms form the composition of the phrase. For a single term and is not composed of separate noun phrase more than the use of association rule mining to identify a first step, separately for the case of association rules with the second step to find out. Identify the characteristics of the candidate set after frequently want their cut, so as to reduce the algorithm's error.Extraction of the word opinion and determine the polarity of the customers are the evaluation algorithm decide whether or not the key to the success or failure. Opinion the term is a characteristic of the goods, commentators to express their views (positive or negative) of the word or phrase. Determine the polarity of opinion words is one word to judge are complimentary sense, derogatory, is the ability to determine sentence comment on the most important prerequisite for semantic preference. Choose from the frequency characteristics of this article recently, and in the subsequent description of the frequency characteristics of the word opinion, found that the adoption of specific accuracy of this method is relatively high. To determine the polarity of the word opinion, this article will be classified as two types of the word opinion: complimentary sense and derogatory, the use of SQLServer2000, set up tables "dic". Table "dic" property has two "word" and "pos", were used to store the word opinion and the opinion of the tendentious word. The use of two figure "1" and "-1" to correspond to the two complimentary sense and negative bias. Opinion in the determination of a bias term, the first look-up table "dic", if the search term, then its part of speech tagging; If you do not find it, this time the need for manual identification, if the user know that their inclination will be add to the table its "dic", and marked their preference; Otherwise, if it can not determine the preference, then abandon it.The final chapter of this article reviews implementation of the sentence to determine the polarity and the polarity of opinion was the definition of the word is similar to a polarity of the sentence is the judge may determine that a sentence comments are complimentary sense, the derogatory manner. In this article, comment on the sentence in accordance with its opinion contains the word is divided into three categories: comments in the opinion the majority of the term for a class of polar, comments contain the same number of complimentary sense derogatory words and opinion words and opinion on the two types apart from outside all situations. For the first category, the basis for polarity of opinion words and algebra to determine the polarity of sentence review. For the second category, and find out all the reviews the characteristics of effective frequent word opinion, opinion based on an effective sentence of words to determine the polarity of comments. The third type of situation for the comment tag for the sentence on the polarity of the polarity of a comment.In this paper, the author has achieved the improved customer reviews mining algorithm, with the rapid development of e-commerce, network transactions will also be increasing the number of online comments appear quantity of goods will also increase with the growth in transactions, customer reviews are also going to be the future mining over a period of time the field of text mining a hot topic. Mining algorithm with the constant improvement and a new mining algorithm, text mining will certainly be able to provide consumers with reliable and convenient services, to contribute to e-commerce.
Keywords/Search Tags:Text mining, Association rules, Extract opinion word, Polarity identification
PDF Full Text Request
Related items