| With the boom of Internet technology, more and more users to participatein the joint construction of the Internet in the past, from passive recipients ofinformation has gradually transformed into active creator of information.Therefore, there are many valuable comment of people, products on theInternet. These comments reflects the user’s standpoints, views, and they areof important research value. However, as more and more users to share theirviews and standpoints on the Internet, the comments information is increasingrapidly, relying on artificial method has been difficult to analyze themtreatment. Therefore, comments mining technology will be born, whichmainly contains feature mining, extracting the user views, the emotionalanalysis techniques and so on. The first job of comments excavation is toreview mining feature mining object, its accuracy and comprehensiveness hasimportant significance for the follow-up study. Because of different words canbe used to describe the same features in a review, so extracting and clusteringthe feature of comment object is more challenging. This paper aiming atChinese customer comments for the comment object feature extraction foranalysis and research. The main research contents are summarized in thispaper:For extracting the features which customers are concerned from Chinesecustomer review, this study is based on the theory of association rules, i.e.using Apriori algorithm to extract frequent item sets. And combined withthree rules to prune the frequent item sets.Then, for the problem that the precision of feature extraction based onApriori algorithm is not good, introducing the concept of domain terminologyto improve the precision of mining method. This paper set comment objectfeature as a domain term of the comment, using domain consensus and domainrelevance to calculate the degree of candidate features associated with the comment field, and to sort the features by the similarity to filter out thosefeatures which have the relatively low level of similarity to improve miningproperties.In this paper, a method of calculating the semantic similarity betweenfeatures has been improved, not only considering the similarity betweenwords alone, but also considering the co-occurrence of feature words andopinion words. A new semantic similarity calculation between features isproposed by combining with the word similarity based on Hownet and thesimilarity based on the co-occurrence information between feature andopinion.Finally, a feature cluster algorithm based on semantic similarity betweenfeatures is proposed. The algorithm gathered together the features which havea certain degree of similarity to avoid the features having differentexpressions in the comments.Some customer reviews in this thesis were downloaded from Internet,and verify all the algorithms presented in experiments, proved that thismethod has a good extraction performance. |