Research On Text Classification Filtering Technology Based On Latent Semantic Indexing And Support Vector Machine

Posted on:2011-05-26

Degree:Master

Type:Thesis

Country:China

Candidate:L L Jiang

Full Text:PDF

GTID:2178360305478207

Subject:Computer application technology

Abstract/Summary:

PDF Full Text Request

With the development of Internet infrastructure, network technology innovation and development, the network has increasingly penetrated into various industries, Its effects related to all aspects of human social life, the network application has been fun from life and gradually to the socio-economic realm, the requirements of users on the network had been steadily improving. However, how to filter out from the Internet and personal interest not related to information and they are not subject to unlawful harassment has become an urgent need to solve the problem,information filtering has become a field of information technology in the current network is an important part of the study.This paper explores the feasibility of technology and in filtering system and filtering system performance evaluation indicators by analyzing the text information filtering model, according to the modular design concept, dividing it into pre-processing module, feature dimension reduction module, training module and filter module in four parts, designed and implemented filtering system based on latent semantic indexing and support vector machines.This paper proposes a method of feature dimension reduction based on clustering and latent semantic indexing model. In-depth study of the feature dimension reduction methods, based on the characteristics and requirements of feature dimension reduction, the use of mutual information-based improvement of k-mean algorithm for dimensionality reduction, based on Mutual Information to improve k-mean algorithm for dimensionality reduction, the role of the sense of classification characteristics of the same or similar items have been combined, which has considerably reduced the number of features. And it is combined with latent semantic indexing method obtained by clustering the feature set to do the semantic level of compression, namely through the feature extraction, feature space for further dimension reduction. It is conducted the experiments, the results show that the algorithm is feasible.An effective solution a large number of features in feature set to determine the contribution of very small.For a variety of text classification algorithms, the study of the thesis focuses on the Support Vector Machine (SVM). It is analyzed that the the traditional multi-classification problems and proposed a classification methods of combination with genetic algorithm and binary tree SVM., by using genetic algorithms for better optimization of binary tree support vector machine model, application of genetic algorithms to multi-class training samples are divided into two types of training problems in each node, until it reaches the leaf nodes up, It is separability between classes so that the child is greatly enhanced, it can obtain a reasonable binary tree structure, and ultimately to achieve the optimal adaptive binary tree. This method reduces the classification time and increase the accuracy of classification, the final order to verify the feasibility and effectiveness of the improved algorithm, select the text classification corpus of Fu dan University to simulation experiments.

Keywords/Search Tags:

text filtering, latent semantic indexing, support vector machine, feature dimension reduction, information filtering model

PDF Full Text Request

Related items

1	Research And Practice Of Chinese Text Filtering System Based On Internet
2	Security Filtering Objected To Illegal Text
3	Research On Text Classification Based On Ontology And Latent Semantic Indexing Algorithm
4	Research On LYNC Instant Message Filtering Based On Latent Semantic Index
5	Objectionable Information Filtering System Based On ATN Algorithm And Latent Semantic Indexing
6	Research On Filtering Algorithms Of Text Information Based On SVM
7	Automatic Classification Research On Chinese Web Document Orientation
8	Research On Support Vector Machines Classification Algorithm In Text Categorization
9	Application Research Of Network Information Filtering Model Based On The Content
10	Text Classification Research Based On Support Vector Machine