Font Size: a A A

Research On Data Imputation Methods Of Mixed Missing Type

Posted on:2018-12-11Degree:MasterType:Thesis
Country:ChinaCandidate:L WangFull Text:PDF
GTID:2348330512487341Subject:Software engineering
Abstract/Summary:PDF Full Text Request
With the development of science and technology,the greatly improving capacity of data acquisition and storage leads to the rapid expansion of data scale.It brings more opportunities for data mining and data analysis,but at the same time various data quality problems also gives us a huge challenge,among which data missing is one of the key problems affecting the data quality.A large number of missing values in database not only seriously affect the quality of query,but also affect the correctness of data mining and data analysis,and then mislead decision.Therefore,this paper focuses on such problems as data imputation.There are many methods of data imputation at present,most of them aim at one missing type to realize imputation,but these complex large-scale data often contain mixed missing type,such existing methods alone cannot reach a good imputation effect.Therefore,our studies of imputation methods focus on the complex situation of different missing types appearing in incomplete data.The summary of work:First of all,in view of the characteristics of missing in normal and the principle of association rules,this paper proposes a method of data imputation based on weekly usable itemsets and focuses on solving two problems.First,a method of association rules based on Boolean matrix for the problem that the time consumption of the frequent pattern mining is large,not only reduce the I/O operation but also optimize mining efficiency in view of the characteristics of Boolean matrix.Second,a method of data imputation based on weekly usable itemsets for the problem that imputation rate is affected by the fewer rules,establishing a connection between the frequent and weekly usable itemsets,on the one hand,using connection rules can improve the imputation rate;on the other hand,using mutually-exclusive rules can provides the effective calculation basis for the next step.Then,in view of the characteristics of missing in abnormal and the idea of the recommendation algorithm,this paper proposes a method of data imputation based on tuple similarity.On the one hand,establishing invert-lists improves query efficiency;on the other hand,calculating tuple similarity based on attribute contribution improves accuracy,finally obtains the optimization value with top-k score.Finally,two data sets from the UCI Repository are used in the simulation experiments.Experimental results show that our method is more effective in incomplete data with mixed missing type.
Keywords/Search Tags:data imputation, missing type, association rule, weekly usable itemsets, tuple similarity
PDF Full Text Request
Related items