Font Size: a A A

The Application Research Of Support Vector Machine Theory In Text Categorization

Posted on:2008-05-06Degree:MasterType:Thesis
Country:ChinaCandidate:Z L ZhouFull Text:PDF
GTID:2178360212990232Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
With the rapid development and spread of Internet, electronic information greatly increases. It become a hotspot for information science and technology that how to collect and find the interested information of user, and discovery latent, useful knowledge. Data mining technology is a new research fields to solve the problem. Structural data such as relational database is main research object for DM, but a majority of information exits with the form of unstructured data in realization, so mining the unstructured information succeeds DM as a new challenge.Text data is a kind of information form used most spread among common unstructured data such as text, image, and video and so on. It is often used in digital library, news group, organization and individual homepages. With the increase of text information, it is a great challenge for information science and technology that how to organize and process large amount of document data, and find the interested information of user quickly, exactly and fully. As the key technology in organizing and processing large amount of document data, text categorization can solve the problem of information disorder to a great extent, and is convenient for user to find the required information quickly.Due to the problem mentioned above, the main works of this paper are the following three aspects:Firstly, the concept of Data Mining and the related technologies of text categorization in Data Mining are analyzed in this paper. The mutual information method is improved in feature selection phase.Secondly, we study the Support Vector Machine theory seriously. Sevral hotspoint problems are discussed such as training algorithm, classification algorithm, multi-kind algorithm etc. We expatiate that the research and application staus of Support Vector Machine.Thirdly, the application of Support Vector Machine theory in text categorization is improved. The traditional Support Vector Machine (SVM for short) cannot adapt the problem that the text database updating along with the time. Due to the problem, this paper proposes a novel approach-Incremental SVM text categorization algorithm. It analyses the possible change of support vector set with the KKT qualification after new text sets are added to the training set. The experimental results show that the new algorithm has the same classification and generalization ability as the traditional SVM. Finally we point out some important issues which researchers can do further research.
Keywords/Search Tags:data mining, text categorization, support vector machine, KKT qualification, incremental
PDF Full Text Request
Related items