Research On Handling Missing Date Based On Statistical Learning

Posted on:2013-07-29

Degree:Master

Type:Thesis

Country:China

Candidate:L Cao

Full Text:PDF

GTID:2248330377959112

Subject:Computer software and theory

Abstract/Summary:

PDF Full Text Request

With the rapid development of the Internet, social digitization and economy, the size ofthe data are growing at an amazing speed. Obtaining valuable information from large databecomes more and more important, therefore, data mining technology came into being. Morepeople put more attention to data mining. The majority of data mining algorithm and modelare based on the ideal data set, however, the real data is often incomplete, namely, missingdata. We usually handle the missing data by some method, and we can mine data on completedata set.There are many imputation methods on missing data estimation, each method has itsspecial advantage and disadvantage. Based on a mass of studies on missing data, in this paper,we proposed a method on missing data which including four major steps. There are variableselection, regression imputation, cluster analysis, regression imputation. This method is basedon a lot of knowledge on statistical learning, so it is called the method of handling missingdate based on statistical learning. In addition, in this paper, for the cluster analysis which usedin new method on missing date, we do a large amount of research on the advantages anddisadvantages in K-means, and we proposed an improved clustering algorithm. Then weproposed a complete cleaning process flow on handling missing value.Finally, we did experiment respectively on the data set with clustering, a random data setand a real data set. Through a comparison with other handing missing data method, theexperiments show the effectiveness of the method of handling missing date based onstatistical learning.

Keywords/Search Tags:

data preprocessing, missing data, clustering, regression

PDF Full Text Request

Related items

1	Data Preprocessing And K-Means Clustering Based Support Vector Regression Model
2	Research And System Construction Of Data Preprocessing Mechanism
3	Research And Application On Data Preprocessing Algorithms
4	Research On Improvement ELM Based Filling Approach Of Missing Data
5	Research And Application Of Data Mining Algorithms Based On Data Preprocessing And Regression Analysis Techniques
6	Research On Data Remediation Method Of Marketplace Passenger Data Missing
7	Researches On The Classification Of Imbalanced Data With Missing Values
8	Application And Research Of Intelligent Algorithmsl In Industrial Data Preprocessing
9	Research Of Data Preprocessing For Data-driven Modeling
10	Research On Clustering Preprocessing Of Data Resource And Its Application