| With the rapid development of Internet,electronic commerce has become part of the fabric of people’s life.People purposely refer to the previous users’comments before making consumption decisions.Potential consumers may be influenced by existing users’ opinions correspondingly.In addition,those opinions are easily to be published on the shopping platform with quite few cost.It becomes one of the reasons some of the sellers hiring "Online Water Army"to post positive comments to promote their own products,or to post negative comments to defame competitors’ products.Those artificially fabricated comments can jeopardize Internet business order through misleading consumers.It is necessary to identify and remove the disgustful comments.At the same time,it is hardly possible to point out the authenticity of one piece of comment by human without enough experience or time.The high cost is as big a problem as low accuracy while using human labor.This paper studies the false comments recognition techniques,and researches the Golden set of false comment with both observation and experiments.The state of the art of machine learning has inherent advantages in solving the above problems.This paper proposes two novel approaches base on one of the machine learning models named Co-training.One approach trains classifier with terms and PCFG rules named CoSpa model,and the other one trains with terms only but equally grouping by entropy named CoFea model.The experiments show that those new approaches have certain advantages compared with popular methods commonly used at present.For example,the identification accuracy result of CoSpa-U algorithm is 90%.The identification accuracy result of CoSpa-C algorithm is 85%.Both strategies of the CoSpa algorithm are better than the referenced SVM classifier which has a 75-80%identification accuracy.The identification accuracy result of CoFea-T algorithm is 83%,slightly higher than the result of CoFea-U algorithm which is 80%.Both strategies of the CoFea algorithm are better than the SVM classifier which has a 75-80%identification accuracy.In general,the results show the CoSpa algorithm produces higher identification accuracy than the CoFea algorithm,while the CoFea algorithm leads a less running time consumption.The research of this paper provides a promising idea and method to solve the problem of internet online false comment. |