Research On The Application Of Information Gain In Data Mining Classification

Posted on:2017-08-04

Degree:Master

Type:Thesis

Country:China

Candidate:L C Mao

Full Text:PDF

GTID:2348330485476554

Subject:Statistics

Abstract/Summary:

PDF Full Text Request

Data mining is an emerging discipline, aimed at analyzing the characteristic of data and data model is set up and dig out the inner link of data, which is applied to forecast data, one of the most widely used is pattern classification technique. Pattern classification technique based on the linear classifier, the Fisher discriminant criterion, the most widely used linear extraction method, there are a variety of improvement such as weighted Fisher discriminant method, but when the dimension is too large, the linear discriminant classification accuracy is significantly reduced after extraction. Fisher discriminant method in view of this, the optimal combination of factors on the basis of the possible combinations of all factors to get the corresponding linear discriminant, on the basis of back to the generation of accuracy. Back to the generation of the precision of combination, is the optimal combination of factors, thus improve the linear classification accuracy, but when excessive factor, computational complexity exponentially, when the number of factors is greater than 15, the algorithm cannot achieve. The thinking of KNN algorithm is selected in the feature space is similar to the unknown sample(that is, the feature space of the adjacent) of k samples, including most k sample belongs to a certain category, then the unknown sample also belong to this category. This method only on such a decision based on the nearest one or a few samples of the category to decide to stay samples belong to categories, although in principle also depends on the limit theorem, decision, but in the category only is associated with very small amounts of the adjacent samples, and will all categories as equally important, when sample uneven density, easy to cause miscarriage.KNN algorithm of evidence theory in KNN algorithm on the basis of introducing function of evidence, according to the sample under test to the distance of the training sample generation function of evidence, the evidence function integration, the most trustworthy evidence is the final classification, the algorithm effectively improves the KNN algorithm will all categories as equally important defects, make full use of the neighboring samples information. But dimension when the sample is too high, the number of attributes too much, will cause the high computational complexity and applicability is not strong. In classification method improvement, information gain is widely applied to improve the accuracy of the method, this paper introduced the information gain, to establish the optimal combination of the factors based on the information gain Fisher discriminant classifier, calculate the information gain of each factor and descending order, and, in turn, before taking a factor combination, and get the corresponding discriminant, calculate the corresponding correct back generations, select back to the generation of the precision of combination as the optimal combination, reduced the computational complexity from index for linear, so as to realize the optimal discriminant classifier combination factor optimization. At the same time, the information gain in KNN algorithm introduces evidence theory, put forward a kind of information based on evidence theory KNN algorithm, namely before set up the evidence function, calculation factor information gain, selecting information gain before a big factor combination, classification and classic KNN algorithm, and on the basis of the back to the generation of accuracy of KNN classification. Delete redundant factors, so as to screen out the important attribute, and on the basis of important attributes screening of nearest neighbor samples, effectively reduce the neighbor samples and the computational complexity in the process of evidence fusion.Experiments show that the optimized classifier is effective to eliminate the redundancy factor, performed well in the low dimensional data, not only has good classification accuracy, is more effective to improve the original classification method in the classification accuracy rate fell sharply in the high-dimensional data.

Keywords/Search Tags:

information gain, Fisher discriminant, KNN classification method of evidence theory

PDF Full Text Request

Related items

1	Nonlinear Fisher Discriminant Based Methods For Face Recognition
2	Research On Face Recognition Technique Based On Kernel Fisher Discriminant
3	Research On Decision Analysis Method Based On Improved Bayesian Rough Set And Evidence Theory
4	Research On Face Recognition Technique Based On Fisher Discriminant
5	Research Of Classification Methods Based On Evidence Theory
6	A Study Of Fisher Discriminant Analysis Based On L₁ Norm
7	Face Recognition Method Based On The Algebraic Characteristics Of Study
8	Study On Improved Two-phase Fisher Discriminant Analysis Algorithm And Its Application To Face Recognition
9	BP-Fisher Discriminant Analysis
10	Research On SVM Classification Algorithm Merged With Fisher Discriminant Analysis