Research On Data Missing Problem Of Imbalanced Data Set

Posted on:2017-04-16

Degree:Master

Type:Thesis

Country:China

Candidate:T T Zhang

Full Text:PDF

GTID:2348330482986414

Subject:Computer Science and Technology

Abstract/Summary:

PDF Full Text Request

Imbalanced data set is a widespread data form in the area of data mining. Due to the wide gap of quantity of different categories data samples, the effect of normal classification algorithm is not obvious. In the field of data mining data missing is also an inevitable problem. The data sets in the collection or storage lead to data values missing or attribute missing due to environmental factors and so on, and the results may be missing the knowledge of data information. The imbalanced data sets and missing data sets brought difficulties to the data analysis and knowledge discovery, so the research of such data sets have been attracted more and more attention. With the rapid development of computer technology, the classification problems basing on data mining and machine learning become the method of highspeed decision, accurate judgment and effective auxiliary of enterprise and organization. And the imbalanced data sets with missing data generally exist in computer science, bioinformatics, economics and other fields of application, for the imbalance that people often care about the minority classes, and for missing data people often concern about the missing of useful information. So it is especially important for the processing of such data sets.This paper first describes the problem of imbalanced data sets and data missing, and summarizes the achievements of such data sets by domestic and foreign experts. It expounds the classification influence of imbalanced data sets with missing data, the general processing methods and the performance evaluation standard of classifier. The data values missing and attributes missing are also described in detail. Making the best use of existing data in the data set, this paper proposes a data values imputed strategy which based on density clustering and grey relational analysis technology. At the same time, it applies transfer learning to deal with attributes missing in data set, uses spectral feature alignment algorithm to enhance the attributes. And it combines with the boundary of cluster sampling method based on density clustering to solve the samples imbalanced problem in the data set. Use support vector machine as the classification model to classify the data set after above steps. Finally, the processing problem of imbalanced data set with data missing applies to the computer-aided medical diagnosis based on data mining. Use real medical data sets to verify the method proposed in this paper. It can achieve a good classification effect and provide assistance to the doctor's diagnosis.

Keywords/Search Tags:

imbalanced data set, data missing, data value imputation, transfer learning

PDF Full Text Request

Related items

1	Studies On Missing Data Imputation
2	Imbalanced-type Incomplete Data And Missing Value Imputations Based On TS Modeling
3	Research On Imbalanced Data Sparsity Problems
4	Research On Adaptive And Robust Missing Value Imputation Algorithm
5	Research On Passenger Transport Data Quality Detection And Missing Data Imputation
6	Attribute Correlation Modeling And Missing Value Imputation Of Incomplete Data Based On Fuzzy Partition
7	Research On Missing Data Imputation Based On Tensor Decomposition
8	Research And Implementation Of Imputation Method For Missing Data In The Trash Pickup Logistics Mangagement System
9	Research On Data Imputation Methods Of Mixed Missing Type
10	Research On Data Imputation Methods Oriented Specific Domains