Font Size: a A A

The Web Pages Classification Method Based On Semi-supervised Support Vector Machine

Posted on:2011-01-25Degree:MasterType:Thesis
Country:ChinaCandidate:C G WuFull Text:PDF
GTID:2178360308490376Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
With the development of internet, the web pages are hoped to classify automatically in order to organize and analyse the large numbers of information efficiently. Among the various web page classification methods, support vector machine became the hot spot research area because of its excellent learning capacity. However, the large amounts of data in real life is unmarked, and the data for marking the work of the time-consuming. This is to promote the study of machine learning into a new phase, combined with data marked and unmarked data, semi-supervised learning is becoming the new hot spot.First of all, the process of web page classification, the classification types and the evaluate standard all has been described. And the development of support vector machine, working theory and related technologies has been introduced. And then the important function of the support vector machine in web page classification has been summarized. And the training algorithm of support vector machine and semi-supervised support vector machine has been studied to aim to the long time of training and lower accuracy. And, the strategy of sample selection for active learning has been researched. Then a semi-supervised support vector machine learning algorithm based on active learning has been proposed, which trains early learner by a spot of labeled-data, selects the best training samples for training and learning by active learning and reduces learning cost by deleting non- support vector. And the algorithm may get good learning effect at less learning cost.At last, the improved semi-supervised support vector machine learning algorithm has been used into the web page classification system, and the capability of classification has been compared and analysed. The result indicates that the proposed method has higher efficiency and accuracy.
Keywords/Search Tags:Support vector machine, Semi-supervised learning, Web page classification, Active learning, Best training samples
PDF Full Text Request
Related items