Font Size: a A A

Research On Subspace Classification Methods And Their Applications

Posted on:2014-01-10Degree:MasterType:Thesis
Country:ChinaCandidate:J F ZhangFull Text:PDF
GTID:2268330401474769Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
Classification analysis is an important research in data mining, and has been widely used in many fields, such as text mining, bioinformatics, information security, etc. Currently, development of information technology results in the exponential growth of varying data, which are commonly generated in several hundreds or even a thousand dimensions. The classification technique expands to the high-dimensional data.Due to the inherent sparsity of the data, the phenomenon of "empty space" and the curse of high dimensionality, the effectiveness of similarity measurement which is usually used in low-dimensional data classification, will degrade rapidly in high-dimensional space, and hardly adopted in dealing with high-dimensional data. The dimensionality of data can be effectively reduced by using dimensionality reduction such that the conventional methods are conducted on reduced low-dimensional space, yet it is difficult to meet the demand of high-dimensional data classification. The feature-weighting technique, which has resoundingly applied to high-dimensional data, is widely integrated with classification, where many subspace classification methods are generated. However, they need to be further improved because of the higher demand of classification analysis and the more complex data.In order to address these problems, some new methods are proposed in this thesis, which focus on the nearest-neighbor classification method and the characters of high-dimensional data in entire space and subspace. The methods mentioned above are also used in text categorization, malicious code detection. The researches in this dissertation have much important theoretical and practical significance.The majority of our work and contributions can be summarized as follows:1. A new subspace classification algorithm based on feature-weighting is proposed, where supervised training is combined with unsupervised learning, to optimize t classification model and improve the classification accuracy.2. A new projected-prototype based classifier is proposed for high-dimensional text categorization. A newly designed objective function and projection method are proposed to learn the optimized feature subspaces and projected prototypes. 3. A new feature selection method is proposed to learn the feature subspace, based on which the classification system is built for the obfuscated malicious code detection.
Keywords/Search Tags:subspace classification, feature-weighting, projection, feature selection, text categorization, malicious code detection
PDF Full Text Request
Related items