Font Size: a A A

The Research Of Web Pages Classification Based On SVM Technique

Posted on:2012-01-30Degree:MasterType:Thesis
Country:ChinaCandidate:G Q WangFull Text:PDF
GTID:2178330338493790Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
With the development of internet, internet becomes the main source of access to information. In order to effectively obtain this information, we hope to classify the web pages automatically. Therefore, the web classification is an import technology which effectively organize the amount of information on the internet. According to machine learning, it enables the automatic classification of web pages. Among the various web page classification methods, SVM became the hot spot research area because of its excellent learning capacity.First of all we introduce the theoretical of SVM, principles and training algorithm. Then we summarize the advantage of web classification based on SVM, elaborate the training algorithm of imbalanced SVM and multi-class SVM. We change the standard fuzzy SVM model, introducing a parameterĪ»to control the hyper-plane position that makes it near to the category which has more training samples. This algorithm constructs the membership function by sample mutually center distance which can reflect sample distribution better and reduce the effect of noise data.In order to solve the problem of designing the binary hierarchical structure of multi-class, we use the improved k-means algorithm design the binary hierarchical structure. The improved k-means algorithm makes the separability of one macro-class is the smallest, makes the separability of two macro-classes is the largest. This algorithm can enhance precision of web page classification.At last, we use the improved SVM training algorithm into the Chinese web page classification system, test the accuracy of the improved algorithms. The result indicates that the improved algorithms have higher classification accuracy.
Keywords/Search Tags:SVM, Web pages classification, imbalanced training data set, membership function, multi-class classification
PDF Full Text Request
Related items