Font Size: a A A

Research And Implementation Of Incomplete Data Processing Based On AP Clustering

Posted on:2019-02-07Degree:MasterType:Thesis
Country:ChinaCandidate:Z Y YingFull Text:PDF
GTID:2348330545462590Subject:Computer technology
Abstract/Summary:PDF Full Text Request
In the coming era of big data,data mining,which is to discover the potential relationship of data for value assessment and decision guidance,has gradually become a hot topic in the research.However,high-quality decision depends on high-quality data.Therefore,it is very important to preprocess data before data mining.In reality,various data quality problems increase the difficulty of preprocessing and the incomplete data can't be avoided.This paper deals with incomplete data in preprocessing,and the main research contents are as follows.1)This paper proposes a K-nearest-neighbor filling algorithm with incremental AP clustering(IAPSKNNI),when incomplete data is missing at random.Firstly,the reasons for incomplete data are analyzed and the processing methods are summarized.And then the processing strategy that is based on clustering,is determined.According to the requirement of dynamic processing,incremental AP clustering applies to updating the clustering results and the complete information of the data can be used to estimate the missing values.In the meantime,the improved K-nearest-neighbor filling algorithm makes it possible to calculate the filling value without setting the K value.Finally,the simulation and experiment proves the good filling performance by IAPSKNNI.When the missing rate is high and the available information is little,IAPSKNNI can get better filling values than other K-nearest-neighbor filling algorithm.2)Based on IAPSKNNI,this paper designs and implements the data preprocessing system module.Firstly,the work contents of the preprocessing are analyzed,and the preprocessing work is partitioned into several sub-tasks.At the same time,in regard to the e-commerce data,this paper analyzes the functional demands of the data pre-processing system,and designs the corresponding sub-modules.Based on JDBC interface,four sub-modules are respectively implemented with JAVA language,which are data acquisition,data processing,data detection and scheduling management.Eventually,the complete preprocessing system module is built.
Keywords/Search Tags:incomplete data, filling, AP clustering, incremental, k-nearest-neighbor
PDF Full Text Request
Related items