The Comparison And Optimization Method Of Linear Discriminates

Posted on:2016-07-13

Degree:Master

Type:Thesis

Country:China

Candidate:Y F Li

Full Text:PDF

GTID:2298330467477376

Subject:Computer Science and Technology

Abstract/Summary:

PDF Full Text Request

Although linear classifier is one of the simplest kinds of classifiers in pattern recognition, it can often achieve good results in many applications. Due to its simplicity, easy to implement and low requirements for computing resources it is widely used.Fisher linear discriminate (FLD) gives the solution of finding weight vectors, but has not given a clear explanation about the choice of thresholds that finally determine the location of the hyper-plane. Commonly used thresholds tend to bias to certain kind of samples in imbalanced problems, causing the reducing in classification performance. This paper shows that the main factor that influences the FLD is the imbalance of sample distribution area and put forward some experienced thresholds, considering the imbalanced problems. Each threshold may achieve the best results under specific distribution or specific evaluation criterion. By studying the performance of different thresholds under different evaluation criterion, we get the application scope of each threshold.Pseudo-inverse linear discriminate (PILD) is another kind of widely used linear classifier. This paper proves that the commonly used assumption about the expected output in pseudo inverse method is unreasonable and that the FLD and PILD are not necessarily equivalent even under certain condition and studies the influence of the input data on the final results.Compared with decision trees, neural network and other complex classifiers, linear classifiers are less likely to over fit because of its simple assumption that the samples can be roughly divided into two groups by a hyper plane. This paper argues that the performance of linear classifiers such as FLDs and pseudo inverse can also be improved when combined with Adaboost algorithm. We analyze the characteristic of Adaboost and use it to improve the performance of FLD and PILD.This thesis studies the effect of feature representation on the performance of the classifier and suggests that dimension reduction should be performed instead of adding tiny disturbance when matrix is irreversible and proposes a binary-decimal coding method, which improves classifier performance under the premise of keeping the internal structure of the original data. Experiments show that by choosing the right threshold, taking the proposed feature representation method and combined with Adaboost algorithm the performance of FLD and PILD is improved.

Keywords/Search Tags:

FLD, PILD, Adaboost, Thresholds, Imbalanced datasets

PDF Full Text Request

Related items

1	The Application And Improvement Of SVM Algorithm In Imbalanced Datasets
2	Classification Of Imbalanced Data Based On Margin Distribution Boosting Algorithm
3	Research On Potential Home Broadband User Identification Problem With Large Scale Imbalanced Datasets
4	Research On Imbalanced Data Classification Algorithm Based On Zeroth-order Optimization
5	Classification Algorithm And Evaluation On Imbalanced Datasets
6	Neural Network Based Classification Methods For Imbalanced Datasets
7	The Algorithm Research Of Contrast Patterns Mining Based On Imbalanced Datasets
8	A Symmetric Flipping Algorithm Research For Imbalanced Datasets Based On GMM-EM
9	Research On Imbalanced Dataset Classification
10	The W-SVM Model For Imbalanced Datasets