Font Size: a A A

Research And Application Of News Automatic Classification Technology Based On Support Vector Machines

Posted on:2009-03-30Degree:MasterType:Thesis
Country:ChinaCandidate:X YiFull Text:PDF
GTID:2178360308979731Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
In recent years, information processing turns more and more important for us to get useful information. Text Categorization, the automated assigning of natural language texts to predefined categories based on their contents, is a task of increasing importance. Using the technology of text categorization, the large-scale text data is processed rapidly. And the utilization rate and usability of information is improved consumedly. At present, the system of text categorization usually use the way of statistic and machine learning. It analyses the content of text, judges the similarity and gets the categorization on the level of semantic.In this paper, the theory of support vector machines based on statistical learning theory is researched and discussed in depth on the basis of study in text categorization. The improved bi-direction maximum matching based on bi-dictionary and the stop words elimination algorithm based on dynamic dictionaries is proposed. The two algorithms improve the accuracy rate of text pretreatment, remove the vast majority of useless terms and make the eigenvector of text more accurate. By improving the accuracy rate of text pretreatment, as far as possible to reduce the impact of useless terms and raise the accuracy rate of the input of SVM classifier, so that it can make the results of SVM classification as accurate as possible.The SVM multi-class classification algorithm based on improved polynomial kernel is also proposed in this paper. The algorithm solves the problem of multi-class classification commendably, uses the artificial categories effectively and completes the task of training and classification in short time. At last with the three algorithms a system of news automatic classification is designed and implemented and gives the evaluations and results.
Keywords/Search Tags:text categorization, support vector machines, bi-direction maximum matching, stop words, SVM classifier
PDF Full Text Request
Related items