The Analysis And Improvement Research Of Knn-imputation Algorithm

Posted on:2011-01-31

Degree:Master

Type:Thesis

Country:China

Candidate:L C Huang

Full Text:PDF

GTID:2178360305978002

Subject:Computer software and theory

Abstract/Summary:

PDF Full Text Request

Data mining is a new and hot research area. Over the past 10 years, after the experiencing rapid development, people have developed many mature algorithms used for effectively handling of mass of data, and these algorithms and technology have good performances in the field of data mining. However, data mining technology applicated ecbolic handled the problems, most of then are real life, for the data used for algorithm, the data, generated and collected form the real-life, are usually filled with all kinds of noise, inconsistencies and missing, etc. Therefore, data pre-processing technology plays an increasingly important role in the process of data mining.In facing the problems of various realistic data, it is the most common that the data are missing. In most classical algorithms that have already been developed, it is very difficult that dealing with the missing data. Because the causes of data missing were more complex. In different applications, data generation process was different. So people usually supposed the data are collected according to the ideal state at the time of the design and development. But to mining with missing data, will have serious impact in mining process and the outcome,even lead to draw wrong models and conclusions. Therefore, there was a huge gap between the data mining algorithms and the actual available data.Against the problems such as data missing prevention, avoidance and dealing with, many scholars at home and abroad have done related research. These related studies Absorbed the outcome of Statistics, Machine Learning, Probability, etc. Many algorithms, developed in the data imputation fields, have been proved to be very successful by experimental and industrial applications. In the usual sense, even if the effect of the imputation algorithm is general, the imputation algorithm's role for the improvement of mining algorithms and increase the effect of mining are obvious.This article is the analysis and improvement of KNN algorithm, one of the widely used algorithms with scalability and adaptability. KNN algorithm is a generalized form of NN algorithm, NN algorithm (Nearest neighbor algorithm) was first proposed by Cover and Hart in 1967, proposed for the classification at the earliest. The basic idea is to use class labels closed cases which already know to classify unknown cases. Since it is easy to understand and program, and is applicable to a wide range, after it was proposed,the algorithm have been widely used in the fields of Classification, Cluster, Information Retrieval and Inquiry, Missing Data Imputation. KNN imputation algorithm is an improved version of NN imputation algorithm on the missing data imputation.KNN imputation algorithm used the data points near the missing point to estimate value of the missing data and imputing it. Traditional KNN imputation Algorithm has many deficiencies. For example, computation was considerable. There are many improved algorithms for KNN imputation. They usually focus on the improvement about distance measure, distance calculation, the calculations of the imputation value and the storage index of the results.From the literature, imputation method used a particular method or a sequence of several algorithms. Order to imput the entire data set, but the missing data classify proposed in this paper, even for the same data set, for different missing value, the imputation method is different. The algorithm, especially based on density and neighbors in particular algorithms, should be considered to classify the missing data and imputing them using the appropriate methods, rather than simply using a algorithms to deal with all of the data with the same treatment. The main work of This paper is, in the framework of classification imputation, to propose K-1NN algorithm and CNN algorithm based on the geometric center, and combined with partially imputation, proposes two types missing points are not suitable to be imputed, and these two classes are separated in another part. Based on this, this paper propose two imputation algorithms:PKNN and PCNN imputation algorithms. The experiments confirm that the imputation classification method and partially imputation strategies effectively improve the accuracy of KNN imputation algorithm.

Keywords/Search Tags:

Missing data imputation, partially imputation, classifying imputation, KNN imputation algorithm

PDF Full Text Request

Related items

1	Studies On Missing Data Imputation
2	Nonparametric Imputation For Missing Data
3	Comparative Study On Imputation Methods Of Missing Data In XGBOOST Model Under Complete Random Missing Mechanism
4	Research On Adaptive And Robust Missing Value Imputation Algorithm
5	The Online Imputation Method Of Missing Value Based On KNN And Its Application In Credit Evaluation
6	Research And Implementation Of Imputation Method For Missing Data In The Trash Pickup Logistics Mangagement System
7	Attribute Associated Neuron Modeling And Missing Value Imputation Based On Neural Network
8	Research On Missing Value Imputation Of Incomplete Data
9	Research On Missing Value Imputation Method Based On Mixed Information System
10	Research On Data Imputation Methods Oriented Specific Domains