Font Size: a A A

Model Misfit In Classification And Its Solution

Posted on:2011-05-18Degree:MasterType:Thesis
Country:ChinaCandidate:H J SuFull Text:PDF
GTID:2178330338989588Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
With the development of computer hardware and software technology, more data can be collected. At the same time, the original data processing and analysis technology face new challenges. Classification is a basic task in machine learning and data mining, how to build a classifier with strong generalization ability has been a hot topic.Most classification techniques are based on some underlying models and/or assumptions. When the data fits the model well, the classification accuracy can be very high. However, when the underlying model does not fit the data well, the performance of the resulting classifiers can be quite poor. In this paper, we focus on how to solve the model misfit in classification. The main work here includes:1) Decision Clusters Classifier (DCC), which is a type of clustering-based classification algorithm aims to solving complex and high-dimensional data, was firstly introduced. We analyzed the model misfit problem for feature-space heterogeneity data, and then proposed a new Path-based Decision Clusters Classifier (PDCC) to handling this problem.2) A new method to build random decision trees to construct a random forest was also proposed. In general, decision trees are prone to overfitting, which is also a specific model misfit problem. Currently, random decision forests are the main method to handling this problem but with low efficiency and effectiveness. Hence, a new specific random tree was introduced to ensemble a forest to handling model misfit for multi-class data.3) We also introduced a new technique to handling model misfit in cluster-and-label techniques. Semi-supervised classification methods are generally based on certain assumptions, one of which is the clustering assumption. Cluster-and-label is based on clustering assumption directly, however, its performance is severely constrained by the clustering quality, and we call it the model misfit in semi-supervised classification. This paper puts forward a hierarchical clustering tree to solving the problem.
Keywords/Search Tags:clustering, classification, model misfit, semi-supervised learning
PDF Full Text Request
Related items