Font Size: a A A

Research On Text Classification Filtering Technology Based On Latent Semantic Indexing And Support Vector Machine

Posted on:2011-05-26Degree:MasterType:Thesis
Country:ChinaCandidate:L L JiangFull Text:PDF
GTID:2178360305478207Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
With the development of Internet infrastructure, network technology innovation and development, the network has increasingly penetrated into various industries, Its effects related to all aspects of human social life, the network application has been fun from life and gradually to the socio-economic realm, the requirements of users on the network had been steadily improving. However, how to filter out from the Internet and personal interest not related to information and they are not subject to unlawful harassment has become an urgent need to solve the problem,information filtering has become a field of information technology in the current network is an important part of the study.This paper explores the feasibility of technology and in filtering system and filtering system performance evaluation indicators by analyzing the text information filtering model, according to the modular design concept, dividing it into pre-processing module, feature dimension reduction module, training module and filter module in four parts, designed and implemented filtering system based on latent semantic indexing and support vector machines.This paper proposes a method of feature dimension reduction based on clustering and latent semantic indexing model. In-depth study of the feature dimension reduction methods, based on the characteristics and requirements of feature dimension reduction, the use of mutual information-based improvement of k-mean algorithm for dimensionality reduction, based on Mutual Information to improve k-mean algorithm for dimensionality reduction, the role of the sense of classification characteristics of the same or similar items have been combined, which has considerably reduced the number of features. And it is combined with latent semantic indexing method obtained by clustering the feature set to do the semantic level of compression, namely through the feature extraction, feature space for further dimension reduction. It is conducted the experiments, the results show that the algorithm is feasible.An effective solution a large number of features in feature set to determine the contribution of very small.For a variety of text classification algorithms, the study of the thesis focuses on the Support Vector Machine (SVM). It is analyzed that the the traditional multi-classification problems and proposed a classification methods of combination with genetic algorithm and binary tree SVM., by using genetic algorithms for better optimization of binary tree support vector machine model, application of genetic algorithms to multi-class training samples are divided into two types of training problems in each node, until it reaches the leaf nodes up, It is separability between classes so that the child is greatly enhanced, it can obtain a reasonable binary tree structure, and ultimately to achieve the optimal adaptive binary tree. This method reduces the classification time and increase the accuracy of classification, the final order to verify the feasibility and effectiveness of the improved algorithm, select the text classification corpus of Fu dan University to simulation experiments.
Keywords/Search Tags:text filtering, latent semantic indexing, support vector machine, feature dimension reduction, information filtering model
PDF Full Text Request
Related items