Font Size: a A A

Research On Key Mining Techniques Of Product Reviews In Chinese

Posted on:2010-09-07Degree:DoctorType:Dissertation
Country:ChinaCandidate:Y W HuangFull Text:PDF
GTID:1118360275474165Subject:Computer software and theory
Abstract/Summary:PDF Full Text Request
With the vigorous development of the network, the product reviews with the customer experience, reflecting their opinions on the product features, functions and properties has more and more on the web. By the reference to the product reviews, customers can buy their most suitable products, manufacturers can improve their products and increase their competitiveness. Therefore, the study of product reviews mining becomes more and more important. In this paper, machine learning techniques are applied in the product reviews mining, such as the technique of short texts classification, the mining method of the feature-opinion pairs, the optimization algorithm of the feature-opinion pairs, and the extraction technology of the hierarchical relationships among the products features. The main contributions of this thesis are summarized as follows.The product reviews classification method which basing on the semantic features is proposed. The automatic classification of product reviews can provide a better research material to reduce the complexity of the algorithm for reviews mining, thus to improve the mining efficiency. In this paper, the classification of the product reviews is processed from the angle of short text. First, the product reviews obtained from the web are manual labeled to get the training set. Then the forefront of product reviewsχ2 statistics and semantic contents (product features, opinion words, degree words) are extracted as classification features, and the quantity of the semantic information, the semantic contents those are not selected and the length of the text are also added as classification features. Then the binary classification of support vector machine (SVM) method is used to learn the extracted classification features to obtain the classifier. Finally, the constantly updated products reviews online are classified, and the good reviews are extracted to establish reviews corpora. Experiments show that the classification results of product reviews improve obviously with the adding of semantic content. The precision improved 9 percent and attains to 80 percent. The classification effect is very good for product reviews those belong to short text.A Semi-Supervised Learning method is adopted in product reviews mining, and the mining of features and the mining of opinion words are combined in a unified process to get feature-opinion pairs. As there are corresponding modifying relations between the features and the opinions, the features such as the product component, function and performance and the opinion words which expressed the customer emotions are extracted together with the semi-supervised learning method in this thesis, hence retain the corresponding relations between the customer opinion words and the product features. A Semi-Supervised Learning method can be used not only to obtain expert knowledge from the labeled corpus, but also to enhance the performance of learning algorithm generalization ability from the un-labeled data. Therefore, a hand of defined feature-opinion pairs are as seeds, while the words, the part of speech and the modified relations are taking as a pattern feature set to mine the product features and evaluation in which the customers are really interested. Then the evaluations with multi-features but single-opinion are processed with the obtained product features and opinion words, Experimental results show that both the precision and the recall rate improved 2 percent after such processing. Although the precision is not high when features and opinion words are mined in a unify process, the high recall can help the semi-supervised learning algorithm to mine new information.The sequences of opinions are optimized with Maximize Harmonic-Mean (MHM) to improve the mining performance. For the accuracy of a semi-supervised learning method will decrease sharply with the iteration, and the Harmonic-Mean is easily influenced by extremum, especially the minimum, the sequences with big standard deviation are adjusted with MHM to delete the low-frequency elements in the sequences, hence ensure the recall and improve the accuracy. Experimental results show that precision is at 77.3 percent. When it improves 17 percent, the recall rate reduces only 5 percent.The extraction of the hierarchical relationships of features is proposed. The hierarchical relationships of the specification features are extracted from the product specification files with the structured data mining method, and Bootstrapping method is used to extract the hierarchical relationships of the describing features from editor evaluations. After identifying the features and the corresponding opinion words, the existing reviews mining system didn't further process the features in different expressions and the features with subordinate relationship, so the same features in different phrases may be shown as different features, and the features with subordinate relationship may be shown as parallel features. In this thesis, structured data mining method is used in the mining of manufacturer product specifications to get the specification features and their hierarchical relationships, then a semi-supervised learning method is used in the mining of the editor evaluation on the web site to get the describing features and their hierarchical relationships. Then the similarity between the specification features and the describing features that extracted from a paragraph is compared to get their hierarchical relationships.Finally, the extracted feature-opinion pairs are connected with the hierarchical relationships among the features. Then the same feature in different expressions is merged, and the features with subordinate relationship are put together. Finally, the opinions of every feature are counted, and the product features in different levels are shown from top to bottom in a tree form.
Keywords/Search Tags:Product Reviews Mining, Semi-Supervised Learning, Support Vector Machine, Short Document Classification, Sequence Optimization
PDF Full Text Request
Related items