Font Size: a A A

The Improvement Of Isometric Feature Mapping Algorithm And Its Application To Web Chinese Text Categorization

Posted on:2008-08-15Degree:MasterType:Thesis
Country:ChinaCandidate:J TuFull Text:PDF
GTID:2178360242499182Subject:Mathematics
Abstract/Summary:PDF Full Text Request
It always confronts the problem of large dimensionality of features in Web Chinese text classification and feature selection is hard to deal with this problem completely. According tothis situation, it studies supervised Isomap algorithms and applies them to Web Chinese textvisualization classification as feature extraction. The research emphasizes on the followingfour aspects:(1) Deeply studying the procedure, theory of Isomap algorithm and the embedding method of test data.At first, it deeply analyzes the theory and procedure of MDS algorithm which is the predecessor of Isomap algorithm and educes a sufficient and necessary condition of how to get an accurate solution with MDS algorithm. And then it studies Isomap algorithm and its characteristics, educes the theory foundation of Isomap and makes up the proof of this theory. At last, it deduces a embedding method of test data of Isomap algorithm and gets a direct embedding method of test data.(2) Deeply analyzing kernel Isomap algorithm and its embedding method of test data.It deduces the kernel principal component analysis method of the noncentral data in feature space and gets the general embedding expression of test data. And then it researches kernel Isomap algorithm and analyzes the problem of its kernel matrix construction and its embedding problem of test data.(3) Proposing two novel supervised Isomap algorithms.Aiming at that Isomap is an unsupervised method and the embedding problem of test data of current supervised Isomap algorithms is always very complex, first, it is proposed supervised Isomap( I) algorithm, which introduces a class parameter according to the class information provided by the training data set to adjust the geodesic distances between the data from different classes in training data set and uses the embedding method of nonlandmark in landmark Isomap algorithm to embed the test data. Supervised Isomap(I) algorithm has advantages of sufficiently using class information and easily embedding of the test data. Then, aiming at that there is no guarantee the centralized square geodesic distance matrix is always positive semi-definite in Isomap algorithm, it is proposed supervised Isomap(II) algorithm, which is based on adding constant method in kernel Isomap algorithm and uses the direct embedding method deduced in Isomap algorithm. Supervised Isomap(II) algorithm not only holds the advantages of sufficiently using class information and easily embedding of the test data, but it also better keeps the relative geodesic distance adjusted by class information and facilitates classification.(4) Studying the process of Web Chinese text classification and applying two newly proposed supervised Isomap algorithms to visualization and classification of Web Chinese text. It deeply studies the process and the key techniques of Web Chinese text classification, especially the feature selection and extraction. And then two newly proposed supervised Isomap algorithms were applied to visualization and classification experiments of Web Chinese text. In contrast with existing methods, it gets the better visualization and classification performance and proves the validity of these two methods in visualization and classification.
Keywords/Search Tags:Web Chinese text classification, feature extraction, Isometric feature mapping, Multidimensional scaling method, kernel Isomap method, supervised Isomap method, visualization
PDF Full Text Request
Related items