Font Size: a A A

Research And Realization Of Review SPAM Detection System Under Big Data Environment

Posted on:2016-10-15Degree:MasterType:Thesis
Country:ChinaCandidate:Y J ZhuFull Text:PDF
GTID:2308330503477516Subject:Computer software and theory
Abstract/Summary:PDF Full Text Request
With the development of Internet applications, online shopping is gradually becoming a consumer trend. Online reviews, which provide an important basis for consumers to make purchase decisions and manufacturer to improve their products, form an important part of online shopping. Given product sales are influenced directly by the quality of online reviews, on the Internet lots of online review spam has mislead consumers with malicious intention. Therefore, the automatic detection of online review spam is becoming a research focus. Research on online review spam automatic detection was firstly carried out in English by foreign scholars. However, language differences cause the associated research findings difficult to apply to Chinese online reviews spam detection. As a result, this paper does some research on Chinese online review spam detection and proposes storage strategy for online reviews under big data environments. The main tasks are as follows:1) Construction of Chinese online review corpus. This paper uses customized web crawler tool to automatically crawl Chinese reviews from the Internet. It then stores these reviews with the distributed file storage system HDFS to ensure reliable storage of massive online reviews.2) Development of a Chinese online review spam detection model. This paper regards Chinese online review spam detection as text classification problem, which is solved by a classification model. In order to avoid differences of Chinese online reviews, nine features are extracted from the content of reviews to construct classification model feature vector. The logistic regression algorithm is then applied to verify whether the review is a spam or not.3) Acquisition of review topic relevancy. Review topic relevancy is used to quantify the degree of correlation between a Chinese online review and a review topic. This paper proposes a review topic word mode based on association rule to optimize the recognition of topic words in Chinese word segmentation system and calculates the review topic relevancy by mixture language model.Finally, user comments from the movie "Furious 6" are used to train the classifier. The result shows that the model proposed in this paper can improve the accuracy rating in the detection of Chinese online reviews.
Keywords/Search Tags:online review spam detection, review content feature, review topic word mode, review topic relevancy
PDF Full Text Request
Related items