Font Size: a A A

A Novel Missing Data Imputation Method Based On K-means Algorithm And Association Rules

Posted on:2015-09-23Degree:MasterType:Thesis
Country:ChinaCandidate:C WangFull Text:PDF
GTID:2348330518970633Subject:Engineering
Abstract/Summary:PDF Full Text Request
With the fast development of technology, it is very common to use computers to manage data information and accumulate tons of data among various industries. In the process of extracting and analyzing data,it is hard to avoid losing data, or the consequences caused by missing data. The key consequences are listed below: lots of key information lost from the system; uncertain factors played more significant roles than they should; impossible or hard to use the normal analyzing methodology to analyze data sets. In addition, loss of data in the data set make analyzing process disorder, reduce the accuracy of analyzing result, or even make the it unreliable. Therefore, it is essential right now to figure out how to deal with the problem of missing data.This paper presents a novel missing data imputation method based on K-means algorithm and association rules. This new missing data imputation method effectively integrates these two algorithms together to achieve better performance. The use of K-means clustering algorithm improves the data similarity, so that the association rule mining algorithm can dig out more strong association rules. The use of association rule mining algorithms can fix the problem of low missing data imputation accuracy of the K-means clustering algorithm. This approach effectively solves the problem of missing data imputation,and improves the accuracy of miss data refilling.This paper also analyze the original K-means clustering algorithm. Presents the new method of K selection by calculating the distance gap between different data clusters. Based on the proposed method this paper gives a reasonable value of K with experimental verification. This paper has also analyze the original association rule mining algorithm.Presents a new solution of the long term existing no suitable association rules problem and association rules confliction problem.
Keywords/Search Tags:miss data imputation, K-means clustering, Association Rules, Data preprocess
PDF Full Text Request
Related items