Font Size: a A A

Incremental Learning Approach Of Data Complexity

Posted on:2016-08-23Degree:MasterType:Thesis
Country:ChinaCandidate:J LiFull Text:PDF
GTID:2308330461956023Subject:Control Science and Engineering
Abstract/Summary:PDF Full Text Request
Classification is the foundation of pattern recognition, machine learning and data mining. With the classification theory and its application in the depth and breadth continues to explore, new issues and challenges are emerging, which has a problem more prominent: for the actual task, in the classification process there are so many methods and algorithms to choose, how to measure the difficulty of the problem and the characteristic of the data, and based on the information to select appropriate methods or programs in the various links of classification, which can avoid unnecessary excessive temptations. Under this background, data complexity come into being. But in practice, the new data will constantly produce, while the existing data complexity algorithms are based on batch learning, which face the dynamic increase of the data scale, and how to measure the characteristics of data has become an urgent problem in the field of data mining.This paper focuses on this problem, basing on the in-depth discussion of TK Ho’s theory of data complexity, further study its capability of incremental learning. From the perspective of incremental learning, these 12 measures can be divided into three types to research, namely:based on the sufficient statistic, based on classical classifier and don’t belong to the third class complexity measure. The focus of this paper is from the three classes of complexity measures, discussed and improved the correlation algorithm which has the function of incremental learning. The main research contents and results are as follows:Firstly, the current situation of data classification, data complexity, incremental learning method are reviewed. It is noted that in the field of classification learning, there are less guidelines to choose a suitable one from the various algorithms(Chapter 1).Secondly, there is a depth discussion about data complexity measures and incremental learning method. Data complexity will be divided from the angle of incremental learning, and combined with the idea of incremental learning to discuss whether these measures have the ability of incremental learning. If so, is it possible to achieve, how to achieve; if not, why is it (Chapter 2).Thirdly, the new algorithms or measures need to be verified on artificial data and real data in order to test the effectiveness. The distribution, boundary, separability of artificial data are designed well before produce, so that the experiment has a good controllability, but the results are more reliable on real data. So this paper uses two kinds of data to do the experiments, it can be more reasonable to evaluate our new algorithms (Chapter 3).Fourthly, through the research on the 12 data complexity measures, can be found that F1, F2, T2 and N2 learning algorithms are based on sufficient statistics. They are all about the summation, mean and variance of data, their incremental learning methods already exist. But whether they can be used in data complexity, it needs to carry on the summary, this paper examines the feasibility in the artificial data (Chapter 4).Finally, according to the further study and analysis for data complexity, N3, N4 and L2, L3 are the measures that based on KNN classifier (K=1) and linear classifier. Then this paper proposes I INN algorithm on the basis of INN classifier, and verify its feasibility and effectiveness in the artificial data sets and UCI public data sets. This paper analyzes a fast incremental learning algorithm of SVM. Experiments were carried out in the UCI public data sets, comparing the experimental results, which verifies the validity of the algorithm, so as to realize the incremental learning function of complexity measures L2, L3 (Chapter 5).
Keywords/Search Tags:Data Geometrical Complexity, Incremental Learning, Sufficient Statistic, KNN, Linear Classifier
PDF Full Text Request
Related items