| With the mature of WEB2.0, the Internet has penetrated all aspects of people’s daily life,and at the same time, with the rapid development of virtual community, people can publishpersonal opinions and views in the virtual community at will , and from which consumers canmake decisions very easy in trade to get access to other consumers’comments in the virtualcommunity ,and the comments contains a large amount of useful information. At the sametime, about 80% of the information that the virtual community contains is text data, the datacontains the assessment information of this product or service , and consumers can use theinformation to understand the product or service, and make the consume decision, so the datais a great significant. The development of text mining technology provides a good technologyfor the treatment of the Unstructured text data in the virtual community, text data mining canbe divided into text classification and text clustering, and in text classification , sentimentclassification has developed rapidly in recent years. In text mining, the preprocessing of thetext is the key step in the text mining , the preprocessing of the text has a variety of commonlyused treatment methods, This paper is a comparative study of the different combinationbetween the five classical algorithms(Document Frequency DF, Information Gain IG, MutualInformation MI,χ2 Distributor CHI, Weight of Evidence for Text WET)) and the threecommon weight computing algorithms (Boolean Weight BW, Term Frequency TF, TF-IDF)intext classification preprocessing. A Support Vector Machine ( SVM) was selected as theevaluating classifier. We found the combination of IG and TF-IDF had good performance inour test, and the combination of WET and TF had poor performance in our test . We analyzedthe reasons theoretically .With the maturity of Web2.0, information exchange and informationsharing have reached an unprecedented breadth and depth, which also provide a goodcondition for the spreading of false information in virtual communities. Through the analysisof a large number of comments of virtual communities, this paper presents a solution ofcredibility analysis of comments of virtual community based on text similarity computing inorder to quickly find a virtual community that may exist false information, and which can thebasis that helps give organizations and individuals make decisions correctly, at same time italso gives some advice for virtual community supervisors to manage their virtualcommunities. This paper presents a way which analyzes the credibility of comments of virtualcommunity based on text similarity computing, in order to help people make decisions quickly. The dissemination of network false information is not only in the virtual community,but also in the news reports, network media and so on. |