Font Size: a A A

Semantic Feature Extraction Algorithm, The Contents Of Text Classification

Posted on:2011-01-25Degree:MasterType:Thesis
Country:ChinaCandidate:Z DaiFull Text:PDF
GTID:2208360305994319Subject:Software engineering
Abstract/Summary:PDF Full Text Request
Content text classification is given according to the characteristics of text, the discrimination for several months prior to the determination of the text type of certain or process. How to extract the semantic features for classification of data is the key technology of text classification. The semantic feature extraction for reducing the text data processing dimension, filtering noise and improve data classification accuracy has important significance. Research on precision and efficiency of the algorithm without sacrificing the premise, how to traditional semantics feature extraction algorithm was improved.In this dessertation,the unsupervised feature extracting principal component analysis algorithm and supervision of Fisher discriminant analysis algorithm is profoundly studied,based on which, it is demonstrated that from the angle of data reconstruction and data identification, it is necessary to establish describing feature and discriminative feature two optimization idea; Combining the theory, research spectrum diagram on Riemannian manifolds, Laplace operator's bell (linear approximation, construct a similar Fisher discriminant functions of unsupervised criterion, On analyzing the feature extraction algorithm of supervision and discrimination based on feature extraction, when in considering the characteristics of text type and characteristics of correlation between data, and multi-meanings characteristics of synonymous semantic information for classification of influence, using data reduced-order retention after class semantic information principle, avoid only text semantic information classification accuracy of reconstruction of the lower, Through the analysis of the latent semantic indexing feature extraction algorithm determines the advantages and disadvantages, from the Angle of matrix algebra by singular value decomposition and generalized eigenvalue decomposition of matrix method for characteristic transformation, does not affect the accuracy in the reduction of feature extraction based on time, Through in-depth study of classical linear discrimination algorithm in semantic feature extraction using text categorization, combined with its own characteristics, the clear text classification differs from linear discrimination algorithm, thus the supervision clustering based on the text of a vector matrix and the density matrix to provide category semantic information.Based on the above principle and key technology, this paper designs a distinguishing feature extraction algorithm DSFE semantics. Adopt international general corpora data collection and web page as an experimental data for the design of the data, and the algorithm of the experimental results and experimental comparison, use the accuracy and the normalized mutual information evaluation algorithm is verified, DSFE algorithm in time complexity, classification accuracy and anti-noise ability, etc have good performance.
Keywords/Search Tags:fisher discriminant analysis, latent semantic indexing, feature transformation, discriminative semantic feature extraction
PDF Full Text Request
Related items