Font Size: a A A

The Research Of Web Pages Classification Based On Support Vector Machine Technique

Posted on:2010-06-16Degree:MasterType:Thesis
Country:ChinaCandidate:K W LiuFull Text:PDF
GTID:2178360278960984Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
With the development of internet, we hope to classify the web pages automatically in order to organize and analyse the large numbers of information efficiently. Among the various web page classification methods, Support Vector Machine (SVM) became the hot spot research area because of its excellent learning capacity.First of all we introduce the development of Support Vector Machine, working theory and related technologies. Then we summarize the important function of the SVM in web page classification. We study on the SVM technology,aim to the long time of training and large numbers of iteration when training in the high-dimension dataset,we propose a new SVM training algorithm based on three point iteration.It increase the number of optimized sample point from two to three in each iterated time, reduce the iteration time and training cost. Towards the shortages of the classic SVM training algorithm, we research the theory of SVM incremental learning algorithm, analyse and improve that it lost usefull information of sample sets, propose a new method based on the hyperplane distance. Accoring to the geometric distribution character of SVM, we use the pre-selection hyperplane distance method to choose the most possible support vector sample into the incremental learn processing. On the basis of mantain the usefull information, this method can decrease the numbers of incremental learning sample and improve the training speed of incremental learning.At last, we use the improved SVM learning algorithm into the web page classification system, compare and analyse the capability of classification. The result indicates that the proposed method has higher efficiency and accuracy.
Keywords/Search Tags:Support Vector Machine, Web page classification, SMO algorithm, Incremental learning, Hyperplane Distance
PDF Full Text Request
Related items