Font Size: a A A

Review Clustering Using Dirichlet Process Multinomial Mixture Models

Posted on:2022-02-06Degree:MasterType:Thesis
Country:ChinaCandidate:M Y PengFull Text:PDF
GTID:2518306335986819Subject:Computer technology
Abstract/Summary:PDF Full Text Request
E-commerce has become a very important part of national economy and social informatization,at the same time,online reviews form a rich resource for online marketing analysis,when browsing quality evaluation for buyers to purchaseproducts,merchants,mining reviews of suppliers,collect the market demand,thecase analysis of user requirements and potential customers,provide the importantbasis and analysis.Under this background,the main research contents of this thesis are as follows.The thesis makes an indepth study of the non-parametric and hybrid Dirichlet process multinomial mixture model model(DPMM).The most prominent advantage of DPMM is that there is no need to specify the number of clusters in advance.On the basis of this research,a GSDPMMR model is proposed.GSDPMMR model not only conducts clustering research on review texts,but also integrates homogeneous data from the review dataset: review time,review score to cluster.For the sampling process,gibbs sampling is studied and used to sample the model.Then,two further optimization schemes are proposed based on the model: GSDPMMR-nb,the nearest-neighbor model of the model;GSDPMMR-kd,the kernel density model of the model.Clustering results are oriented to anomaly detection and review spam detection.Therefore,in order to measure the clustering effect of the model on the review data,a model evaluation system is established.Two key indicators,namely,tightness and spamicity are used to effectively evaluate the clustering tightness,spam rate of the clustering group and other aspects.The model experiment results show that the clustering model meets the objectives of this thesis,and the performance of the model is significantly better than that of the traditional comment clustering algorithm.The accuracy of the Sp Eagle algorithm among the top 1000 reviews spam rate is effectively improved about 5%,top100 cluster's spam rate improved about 19% which provides a very novel idea for the field of review group spam detection.
Keywords/Search Tags:Online reviews clustering, Dirichlet process multinomial mixture model, Novelty detection, Data fusion
PDF Full Text Request
Related items