Font Size: a A A

Wikipedia Based Approach For Clustering Subjects Of Reviews

Posted on:2016-01-25Degree:MasterType:Thesis
Country:ChinaCandidate:W C YanFull Text:PDF
GTID:2308330467982354Subject:Computer technology
Abstract/Summary:PDF Full Text Request
With the development of the Internet, surfing on-line has become more and more convenient.People can express their shopping experience or movie reviews at any time, any place. And becauseof that, there is a rapid growth of the online reviews. In a large number of reviews, users alwayscan’t find the reviews they interested at the first time. It not only wastes users’ time but also reducesthe user experience. We can make it convenient for users to read the reviews through clusteringsubjects of reviews to cluster the reviews with similar topic into one category. Clustering subjects ofreviews has a great significance both theoretically and practically.A novel method based on Chinese grammar for extracting subjects of reviews is proposed inthis paper. It can extract subjects of reviews accurately and efficiently. Extracting subjects ofreviews is a preparation for clustering subjects of reviews. The method consist of three main steps:1) Developing models for extracting subjects of reviews;2) Extracting frequent subjects of reviews;3) Extracting infrequent subjects of reviews. By comparison with the traditional methods forextracting subjects of reviews, this method has a better accuracy which is78.56%and a higherF-measure value72.94%.A novel method based on Wikipedia for clustering subjects of reviews is proposed in this paper.To make users easier to get access to useful information, this method clusters similar reviews viatopic words. It consists of four main steps:1) Establishing the vector space model of word;2)Choosing an appropriate formula to measure the similarity between words.3) Building the words’similarity matrix;4) Selecting an appropriate clustering algorithm to cluster subjects of reviews.With an accuracy of75.68%and F-measure value76.87%, the traditional methods are beyondcomparison.At first, the importance of clustering subjects of reviews was discussed, based on that, a novelmethod based on Chinese grammar for extracting subjects of reviews and a novel method based onWikipedia for clustering subjects of reviews are proposed. Finally, experiments show that they havea higher accuracy when compared to traditional methods.
Keywords/Search Tags:Wikipedia, Similarity Matrix, Keywords, Word Clustering
PDF Full Text Request
Related items