With the rapid growth of Internet, the human beings are more and more dependent on the socialnetwork or network communication tools to interact. Since anonymous and concealment of membersfrom online social network, Internet has become a new place for criminals to do illegal trading.Therefore, when investigation organizations deal with crime cases, authentication of online authorshiphas been set the highest priority. However, the criminals often use false information to avoid beingdetected. So it is very difficult to determine the true identity of the author by their registrationinformation, and it also brings new challenges for online authorship identification study.With the forum of www.tianya.com-a famous social forum as research object, this paper make adetailed and deep research about online authorship identification based on Chinese semantic andstructural features and the main works are as follows:1. Analyzing writing style of Chinese online messages as well as extracted feature set, we proposean authorship identification method based on hypothesis testing model. This method creates afeature set specially tailored towards Chinese online messages and evaluates the similaritybetween samples by hypothesis testing. Our experimental result shows that this method applied toidentification field has certain practical significance and application value.2. Aiming to the property of complex and short, this paper proposes the other identification methodthat based on genetic algorithm combined embedded into Support Vector Machine model. SVMis better at dealing with classification problems with higher dimension and GA can produce theoptimal solution of problems. In this paper, we combine these two methods and apply them toonline authorship identification. Experimental results shows that this method has betterrecognition performance with characteristics of fewer selected features, higher recognitionaccuracy and shorter detection time. |