Font Size: a A A

Research On Identifying Review Spam For Product Reviews Based On Data Mining

Posted on:2015-08-29Degree:MasterType:Thesis
Country:ChinaCandidate:Y ChenFull Text:PDF
GTID:2298330422970345Subject:Master of Engineering
Abstract/Summary:PDF Full Text Request
Network product reviews can be delivered at random by Internet in the unconstrainedcase, which will cause the product reviews is useless and untrue information. Thisinformation is garbage comment. Both the reference network comment shopping in theconsumer, still in the business according to these comments and the corresponding evaluationanalysis, these spam seriously affect their access to information. Automatic identification ofspam is an urgent need for consumers and businesses.The article focuses on comment spam recognition by using data mining technology forelectronic products, the main work includes:It must be processed first for short product reviews, which contains network language,establish a sentiment dictionary and then identify the comment spam according to thedictionary area.Build feature words dictionary according to the product specification. The characteristicsof construction product reviews include those comments features: product correlation, thehyperlink characteristics, characteristics, characteristics of continuous digital consulting.According to the characteristics construct the KNN classifier for recognition review forcomment spam.This paper improves the KNN classifier in order to improve the accuracy of spamidentification and accelerate the comment spam recognition speed. The article adopts dynamicK value, weighted distance formula these two respects to improve KNN classifier.Some fake reviews and advertising, which is similar in the content with normal review,but tend to repeat the post. Therefore the need to repeat the last step is to repeat commentspam comments recognition, this paper uses2-gram model to represent text, at the same timeto smooth the model by using Katz smoothing method, combining with KL divergence toidentify. The comment text is sorted according to the number of comments length andemotional words, and then only calculate the similarity of adjacent text in the queue, therebyto reduce the calculation amount of similar comments text recognition.The paper tested the refered method by collecting iPhone4S products from Sina website,theresult also domonstrated the effectiveness of this method.
Keywords/Search Tags:Spam detection, Product Feature Words, KNN, 2-gram model, KL divergence
PDF Full Text Request
Related items