Font Size: a A A

Tree Classifier In Variant Space

Posted on:2009-10-13Degree:MasterType:Thesis
Country:ChinaCandidate:P HeFull Text:PDF
GTID:2178360242993652Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
Classification is an important research area in machine learning. Among the various existing classifiers, one of the most useful and effective tools is decision tree. However, traditional decision tree algorithms usually have their runtime performance sacrificed for the consideration of the limited main memory at that time, and their single-variant test on each tree node can only produce axis-parallel hyper-rectangle decision boundaries, which have weak generalization ability on complex datasets. In this thesis, we focus on the above limitations of traditional decision trees and provide the corresponding solutions as follows.First, we propose a fast main-memory implementation of C4.5 algorithm, named Fast C4.5. Fast C4.5 uses a preprocessing to extract the order information of all the tuples on every attribute, then optimizes the continuous attribute splits evaluations with the indirect bucket-sort combined with the bit-parallel technique, and finally it accelerates the search for cutoff using the binary-search within the narrowest range. Besides, some structural integration is also employed in Fast C4.5 to reduce redundancy and improve efficiency. It is shown that Fast C4.5 reduces part of the time complexity of C4.5 and accelerates the tree construction process greatly in experiments.Second, we present a Latent Attribute Space Tree (LAST) classifier framework, which transforms the original attribute space into the more data-separable or more classifier-oriented latent attribute space, so that the traditional tree classifiers' hyper-rectangle decision boundary can be extended and its generalization ability can be improved. Under the LAST framework, we design two Singular-vector-space Oblique Decision Tree (SODT) algorithms. SODT constructs singular vector space for global and/or local data, builds traditional decision tree or tree nodes in the new space, and obtains the approximately optimal oblique decision tree of the original attribute space indirectly as its result. Experimental results show that, compared with the traditional univariant decision tree and other oblique decision tree algorithms, SODT algorithms give higher classification accuracy, more stable decision tree size, comparable tree-construction time as the univariant decision tree, and much less than that of other oblique decision trees.Third, we propose a Nonlinear Manifold Mapping Classifier (NMMC) framework, which integrates manifold mapping, classifier and test data extension three variable components, to provide a unified framework for nonlinear classifier design. Under the NMMC framework, we propose a Spectral Space Decision Tree (SSDT) algorithm. SSDT implements manifold mapping of NMMC as spectral space transformation on the laplacian matrix to simplify the relationship of the new conditional attributes and the class attribute, and selects decision tree as the classifier for simplicity. By integrating the class information of training data into the original unsupervised spectral space transformation, we also present a spectral space decision tree algorithm based on supervised manifold mapping, which maps the different classes of data into new space with better separation. We prove in experiements that SSDT have great advantage over the traditional decision tree algorithms in respect of classification accuracy, decision tree size and robustness.
Keywords/Search Tags:machine learning, classification, decision tree, C4.5, singular vector space, manifold mapping, spectral method
PDF Full Text Request
Related items