Camplaints Text Classification Research Of Imbalanced Data Sets

Posted on:2016-01-10

Degree:Master

Type:Thesis

Country:China

Candidate:S S Yang

Full Text:PDF

GTID:2308330464459087

Subject:Computer application technology

Abstract/Summary:

PDF Full Text Request

Imbalanced data set is a observation data in the form of actual widespread in many areas of computer science, medicine, food testing, biology and economics, and its meaning is concentrated in a certain type of data is far less than the number of samples to other the number of classes of samples, less the kind of neighbors called minority class. minority class data hidden burden in a large amount of data,usually can’t be classified correctly.This imbalanced data set classification problem is a difficult problem in data mining,how to deal with this problem, cause the various countries’ scholars great importance to it.Imbalanced data sets related concepts,in the article we can understand the word in the field of unbalanced data classification problem of scholars and experts of the progress of this simple to write,review the current imbalanced data set classification what the difficulty and the.solution of the currently widely used method and the classifier performance evaluation standard and so on.In the research on the common sampling technology, the data covered, factors such as the lack of information, sampling technology is studied on the influence of unbalanced data classification learning.there is the foundation and put forward based on the maximum interval sampling algorithm.In order to narrow this new technology on the effects of the nearest neighbor rule for classification using the classifier assumes that the approximate calculation method of the interval on the sampling for the simple extension, from the algorithm combines the improved sampling method based on support vector machine(SVM) integration method of study, unbalance data set classification performance has improved significantly.In this paper, from two aspects of data terminal and algorithm of the strategy to solve the problem of unbalanced data set classification, using the processed data from various complaints website to verify the effectiveness of this strategy and stability, through contrast experiment analysis, has obtained the good classification effect, show the effectiveness of unbalanced data sets in the technology.Improved strategy based on support vector machine(SVM) can better solve the problem of unbalanced data set classification,the best way is to design a special kernel function.So the imbalanced data sets special kernel function remains to be further research.

Keywords/Search Tags:

classification, sampling, imbalanced data, maximum margin, ensemble learning

PDF Full Text Request

Related items

1	Research On Imbalanced Data Classification Algorithms Based On Ensemble Learning
2	Research On Imbalanced Data Classification Based On Sampling Method And Ensemble Learning
3	Comprehensive Oversampling And Undersampling Study Of Imbalanced Data Sets
4	Imbalanced Data Classification Algorithm Based On Unsupervised Intelligent Under Sampling Method
5	The Research Of Imbalanced Data Classification
6	The Algorithm Research Of Associative Classification And Classification Based On Imbalanced Data
7	Research On Ensemble Approach For Classification Of Imbalanced Data Sets
8	Hybrid Ensemble Learning For Imbalanced Data
9	Research And Application On Imbalanced Data Set Classification Problems
10	Classification In Imbalanced Data Based On Over-Sampling And Ensemble Learning