Font Size: a A A

Research On The Key Technology Of Detecting Fake Comments For Electronic Commerce

Posted on:2016-01-10Degree:MasterType:Thesis
Country:ChinaCandidate:K ZhangFull Text:PDF
GTID:2308330482976815Subject:Information and Communication Engineering
Abstract/Summary:PDF Full Text Request
Comments are important data on the e-commercial platform, and play key role in online business. But these comments include large amount of fake comments, which will result in improper decision and bring heavy losses to customers and organizations. So it’s necessary to detect and control fake comments to make the platform healthy and stable.In confront of so many comment data, existed fake comments detecting methods has a certain limitations. There are so many kinds of goods on the e-commercial platform and the related comment data types are rich and complex, the content based methods usually adopt the classification features with field dependence, the classification performance relies on a large number of correct field labeled comment data and the generalization ability is poor. Although the behavior based methods don’t need to label the comment data, but they rely on the specific user’s comment behavior and the recognition rate is not high. For these problems, a systematic e-commercial platform oriented fake comments detection method is proposed. This paper mainly includes three research points. Firstly, recognize the target goods which include fake comments. Secondly, calculate the similarities of comments text. Finally, excavate effective fake comments features and design fake comments detection model. The main work of this thesis is show as below:1) Propose an e-commercial platform oriented target goods recognizing algorithms, these goods include fake comments. There are large amount of comments on the e-commercial platform, and they types are complex, which result in the low precision rate and recognizing rate of traditional fake reviews recognizing algorithms. In order to obtain sample data and proper research, we firstly recognize the target goods. We found the user rate behavior of every goods accord with same statistical law, but it will be distorted by fake rate behavior. By using quantitative index indicate this difference, we can sort the list of goods, and the top goods may include a lot of fake comments. The experiment results show that this algorithm can sort the list of goods effectively.2) Propose an algorithm to calculate comment text similarity. Considering the problem that low accuracy rate when using traditional measurement methods calculate similarity of comment texts. According to the contents organizational characteristics of comment texts, this chapter transforms full text into a tree structure. Then the similarity measurement was decomposed into that between layers of the corresponding tree, which makes sure that the measure objects of the rate of the similarity of every layer were the same type of words and appropriate similarity measurement methods were used respectively. Finally, the overall similarity was captured through integrating the similarities of every layer of the tree using corresponding weights. The experimental results on reality data set demonstrate that the method we proposed is more effective and has a higher accuracy rate compared with other common measurement methods.3) Propose an integrated feature based fake comments detection method. In order to solve the problem that existing method to detect the fake reviews underutilize the dynamic information embedded in user history behaviors, a novel method fusing the static characteristics and dynamic characteristics was proposed. Specifically, firstly, we utilize time series analysis model to mine the dynamic characteristics describing user behaviors in dynamic information. Secondly, we detect the suspicious user through these dynamic characteristics and the user static characteristics, and then determine the review suspicious probability from the user suspicion probability. Finally, we use review suspicious probability, the review static characteristics and combine PU-Learning classifying strategy to train the fake review detection model. The experiments based on real data sets show that the proposed method is better than the existing methods in performance.
Keywords/Search Tags:Fake Comments, Rate Distribution, Tree Structure, Similarity Measurement, Timing Analysis, Fusion Feature, PU-Learning
PDF Full Text Request
Related items