Font Size: a A A

The Studies On Chinese Text Categorization Based On Pso And Svm

Posted on:2011-04-04Degree:MasterType:Thesis
Country:ChinaCandidate:W L LiuFull Text:PDF
GTID:2178330332465287Subject:Computer software and theory
Abstract/Summary:PDF Full Text Request
With the rapid development of science and technology and the increasing popularity of the Internet, the data people is facing has shown explosive growth; it has been a cutting-edge research topic that how to extract information for the massive, duplicate and heterogeneous text data quickly and efficiently. Text automatic classification technology, as an important tool of dealing with massive information, can establish a good organizational structure to improve operational efficiency of the document access and search by refining the text set in the different classifications and extracting the useful information such as knowledge and rules. The text automatic classification technique has broader range of applications with the growing popularity of digital access technology today, for example, digital library, E-mail automatic classification, electronic commerce, news categories and so on. Therefore, the study of text classification technology has an important academic value and broad application prospects.This article firstly introduces a variety of traditional Chinese text word segmentation algorithm,and then an improved dictionary mechanism is designed on the basis of studying the features of common algorithms, and a modified reverse directional maximum matching method is proposed, which has greatly improved the word processing speed and word segmentation accuracy.On the basis of deeply analyzing feature selection evaluation algorithm in the text categorization system, a feature selection algorithm based on the category is proposed. The experiments have showed, the feature which is extracted by the proposed feature selection method is more effective compared with the traditional feature selection methods, and the performance and accuracy of the classification system could be greatly improved.Finally, based on particle swarm optimization, model parameters selection problem for the support vector machine (SVM) is studied in this paper, and then the PSO-SVM algorithm is proposed. PSO-SVM algorithm which is based on mathematical model of SVM, during the training process of SVM classification algorithm,introduce the particle swarm optimization algorithm, optimize the kernel function parameters and error penalty factor, and optimize the best feature at the same time. Based on the establishment of the SVM mathematical model, the SVM parameter choice is transformed to a integer programming problem. The algorithm combines particle swarm optimization algorithm global search features and good classification performance of SVM through designing the particle and evaluation function, which has improved the learning and classification of SVM, raised the text classification accuracy rate, and reduced the number of features. According to the test of Chinese text classification data sets, the results show this algorithm can take on the higher learning ability and better classification accuracy rate comparing with GA-SVM algorithm.
Keywords/Search Tags:Text categorization, Chinese Word Segmentation, Feature Selection, Parameter Optimize, Support Vector Machine
PDF Full Text Request
Related items