Research Of Web Text Classification Algorithm Based LSI And SVC | | Posted on:2011-03-13 | Degree:Master | Type:Thesis | | Country:China | Candidate:H Huang | Full Text:PDF | | GTID:2178330332462711 | Subject:Computer application technology | | Abstract/Summary: | PDF Full Text Request | | The technologies of web texts classification is becoming more significant for key method of organizing and processing with large numbers of web texts on the Internet, with the fast growing of development of World Wide Web. Categorization performance is affected by the results of text preprocessing directly. This paper studied the preprocessing and classification of the web text to improve the precision accuracy rate and recall rate of the web text classification algorithm.The following aspects are studied:First of all, it introduces the concepts and significances of the web text classification, and introduces several new methods of the web text classification, analyses the existing problems of the web text classification algorithm and make a prospect on the development direction of web text classification technologies.Secondly, it utilizes the knowledge of the latent semantic indexing theory to deduce the dimensions of the web text features matrix, the latent semantic indexing transform the word frequency matrix to the singular matrix by using singular value decomposition technology .the method of synonymous of term instead by a root of terms through the latent semantic indexing reduces the web text features vector dimensions and reduces the computational of the classification as well.Thirdly, it applies support vector cluster in web text classification. Support vector clustering is a clustering algorithm based on small samples; it can handle various forms of clustering. This procedure didn't need pre-specified number of clusters. It deals with text feature vector of high dimensional data easily with few parameters. On the base of the characteristics of web text classification, training the samples by the method with small samples to reduce the occupation of storage space and reduce the time of follow up training. The experiment indicates that this method can improve the precision accuracy rate.A web text catalogs based on the combination of latent semantic indexing and support vector clustering is proposed by this thesis for improving the precision accuracy rate of the web text classification algorithm. This paper provides the proof of feasibility and the validity of the method. | | Keywords/Search Tags: | Web Text classification, Text clustering, Feature selection, SVC, LSI | PDF Full Text Request | Related items |
| |
|