Font Size: a A A

Study Of Completing Missing Data

Posted on:2012-01-19Degree:MasterType:Thesis
Country:ChinaCandidate:C M JinFull Text:PDF
GTID:2178330332496988Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
Missing data is a popular problem that permeates much of the modern research work and areas of investigation being done today. It will make the analysis much more different, cause unrealizable results, and decrease the efficiency of the whole statistical program. Especially in the full observation and not fully observed differences between the systems of the circumstances, the use of conventional statistical methods to incomplete data sets made by the results, is not a substitute for the overall. Traditional techniques for replacing missing data may have serious limitations. Recent developments in computing allow more sophisticated techniques to be used.Data Mining (Knowledge Discovery from Database) is a process to mine available, credible, valid and comprehensible pattern from large-scale data in an intelligent and automatic way. Data reinforcement is one of the most important directions in Data Mining. This paper just introduces the theory of the imputation of missing data:1. Describes the research background, research status and classification of missing data mechanism; and explained the basic concept of missing data imputation.2. This paper compares the efficacy of four current and promising methods that can be used to deal with missing data. This efficacy will be judged by examining the percent of bias in estimating parameters.3. The focus of this paper is on new relationship matrix model. The new relationship matrix records all the situations that similarities or differences fort comparing the condition attributes and the decision attributes between objects. Based on it, mines the potential links between objects, and completes the missing data. Results will not undermine the system's coordination.4. There are 2 group experiments to validate the algorithm. Experiment One compared the recovery rate of the mean method, the conditional mean method and this paper's algorithm by processing three data sets in the UCI. Experiment Two mainly focus on the completing accuracy under different deletion. The study involves seven levels of incomplete data.
Keywords/Search Tags:data reinforcement, new relationship matrix, incomplete information table, rough set, collision avoidance
PDF Full Text Request
Related items