Font Size: a A A

Design And Implementation Of Automated Text Categorization In Information Domain

Posted on:2009-11-02Degree:MasterType:Thesis
Country:ChinaCandidate:J CenFull Text:PDF
GTID:2178360272978154Subject:Computer software and theory
Abstract/Summary:PDF Full Text Request
Along with the development of information technology, the information which saves by the text format appears massively in Internet, digital libraries and all kinds of electronic books and periodicals. How to find the necessary information rapidly and exactly becomes a research hotspot in recent years. The automated text categorization technology can classify the texts into several classes according to the text contents, and it adapts to find out the information we need efficiently in massive texts. It is one of the best effective methods to solve the above problem.This paper takes the military information's reorganizing and processes as the application background. It mainly used SVM classification algorithm and dictionary participle method to design and implement the automated text categorization system in information domain. This system can classify the military information automatically, solve the key technologies in military information, and provide the algorithm and model of text categorization for military information system.The research results are descried as follows:1) According to the common models of text categorization, the key technologies of the text categorization system are analyzed.2) Based on the overall design plan of the text categorization system, according to comparing to various classification algorithms and Chinese participle technology, integrating information domain self's characteristics, I implement the military information categorization system using SVM algorithm and dictionary participle method. Because the TF/IDF as one of term-weighting schemes in Vector Space Model has its own flaw, this paper give an improved term-weighting method integrating separable criterion and information domain self's characteristics.3) According to plenty of test data, verify the feasibility of this text categorization system in the information domain.
Keywords/Search Tags:Automatic categorization, Chinese participle, Machine learning, SVM algorithm
PDF Full Text Request
Related items