Font Size: a A A

Camplaints Text Classification Research Of Imbalanced Data Sets

Posted on:2016-01-10Degree:MasterType:Thesis
Country:ChinaCandidate:S S YangFull Text:PDF
GTID:2308330464459087Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
Imbalanced data set is a observation data in the form of actual widespread in many areas of computer science, medicine, food testing, biology and economics, and its meaning is concentrated in a certain type of data is far less than the number of samples to other the number of classes of samples, less the kind of neighbors called minority class. minority class data hidden burden in a large amount of data,usually can’t be classified correctly.This imbalanced data set classification problem is a difficult problem in data mining,how to deal with this problem, cause the various countries’ scholars great importance to it.Imbalanced data sets related concepts,in the article we can understand the word in the field of unbalanced data classification problem of scholars and experts of the progress of this simple to write,review the current imbalanced data set classification what the difficulty and the.solution of the currently widely used method and the classifier performance evaluation standard and so on.In the research on the common sampling technology, the data covered, factors such as the lack of information, sampling technology is studied on the influence of unbalanced data classification learning.there is the foundation and put forward based on the maximum interval sampling algorithm.In order to narrow this new technology on the effects of the nearest neighbor rule for classification using the classifier assumes that the approximate calculation method of the interval on the sampling for the simple extension, from the algorithm combines the improved sampling method based on support vector machine(SVM) integration method of study, unbalance data set classification performance has improved significantly.In this paper, from two aspects of data terminal and algorithm of the strategy to solve the problem of unbalanced data set classification, using the processed data from various complaints website to verify the effectiveness of this strategy and stability, through contrast experiment analysis, has obtained the good classification effect, show the effectiveness of unbalanced data sets in the technology.Improved strategy based on support vector machine(SVM) can better solve the problem of unbalanced data set classification,the best way is to design a special kernel function.So the imbalanced data sets special kernel function remains to be further research.
Keywords/Search Tags:classification, sampling, imbalanced data, maximum margin, ensemble learning
PDF Full Text Request
Related items