Font Size: a A A

Research On The Author Style Classification And Recognition Technology Of Web Information

Posted on:2015-08-10Degree:MasterType:Thesis
Country:ChinaCandidate:R HuangFull Text:PDF
GTID:2308330461997081Subject:Computer technology
Abstract/Summary:PDF Full Text Request
In recent years, with the rapid development of Internet technology, accompanied by the increasing popularity of the Internet, the information on the network becomes more and more rich. The rapid growth of various information resources in the Internet provides convenient and efficient way to obtain all kinds of information for us, bringing convenience to us; On the other hand, the spread of bad information has become a big problem in the field of internet. Therefore, the research of the author identity information for determination technology which based on Web information the author style classification and recognition technology, can help relevant institutions to obtain identity mark information of the illegal information sender, providing the basis for computer forensics, and it is an effective method to purify the Internet environment.Based on the above research background and social needs, this study determines to launch the research in view of Web information the author style classification and recognition technology. Based on the analysis of various data mining techniques and statistical tools, this paper proposes a kind of self- study ability of Web text author style classification recognition tool, the tool extracting text information network firstly, and then to extract the features, applying the sample training and classification algorithm of support vector machine, the text was assigned to the category belongs to the author style automatically. The support vector machine algorithm is divided into two steps: the first is the use of the classification algorithm, the space area was divided by the SVM so as to realize the classification of the author. The second is the similarity algorithm, we will identify the same author transformed into similarity vector in N dimension space calculation, the degree of similarity between the vector angle by computing the cosine equation. To determine whether belong to the same category. After an analysis of the experimental system, the result shows that the system precision fault points in all kinds of basic distribution is the average distribution, there is no false points on specific categories of excessive concentration, from precision macro average overall point of view, the accuracy of the system is better, the recall and the macro average also does not exist excessive concentration.
Keywords/Search Tags:Web information, SVM automatic vector machine classification, feature extraction, classification of style
PDF Full Text Request
Related items