Research And Implementation Of Incomplete Data Processing Based On AP Clustering

Posted on:2019-02-07

Degree:Master

Type:Thesis

Country:China

Candidate:Z Y Ying

Full Text:PDF

GTID:2348330545462590

Subject:Computer technology

Abstract/Summary:

PDF Full Text Request

In the coming era of big data,data mining,which is to discover the potential relationship of data for value assessment and decision guidance,has gradually become a hot topic in the research.However,high-quality decision depends on high-quality data.Therefore,it is very important to preprocess data before data mining.In reality,various data quality problems increase the difficulty of preprocessing and the incomplete data can’t be avoided.This paper deals with incomplete data in preprocessing,and the main research contents are as follows.1)This paper proposes a K-nearest-neighbor filling algorithm with incremental AP clustering(IAPSKNNI),when incomplete data is missing at random.Firstly,the reasons for incomplete data are analyzed and the processing methods are summarized.And then the processing strategy that is based on clustering,is determined.According to the requirement of dynamic processing,incremental AP clustering applies to updating the clustering results and the complete information of the data can be used to estimate the missing values.In the meantime,the improved K-nearest-neighbor filling algorithm makes it possible to calculate the filling value without setting the K value.Finally,the simulation and experiment proves the good filling performance by IAPSKNNI.When the missing rate is high and the available information is little,IAPSKNNI can get better filling values than other K-nearest-neighbor filling algorithm.2)Based on IAPSKNNI,this paper designs and implements the data preprocessing system module.Firstly,the work contents of the preprocessing are analyzed,and the preprocessing work is partitioned into several sub-tasks.At the same time,in regard to the e-commerce data,this paper analyzes the functional demands of the data pre-processing system,and designs the corresponding sub-modules.Based on JDBC interface,four sub-modules are respectively implemented with JAVA language,which are data acquisition,data processing,data detection and scheduling management.Eventually,the complete preprocessing system module is built.

Keywords/Search Tags:

incomplete data, filling, AP clustering, incremental, k-nearest-neighbor

PDF Full Text Request

Related items

1	Clustering Incomplete Data Using Pseudo Nearest Neighbor And Interval-valued Distance
2	Research Of Hybrid Clustering Algorithm For Incomplete Data Based On Interval Estimation
3	Research On Incremental Clustering Algorithm
4	Research Of Hybrid Clustering Algorithm For Incomplete Data Based On Local Weighting
5	Outlier Detection Algorithm And Its Parallelization Based On Weighted K-Nearest Neighbor
6	Research On Noisy Data Clustering Algorithm Based On Natural Nearest Neighbor
7	Study On Generalized Nearest Neighbor Pattern Classification
8	The Application Research Of Incremental Clustering For Document Update Sumarization
9	The Research And Application Of Clustering Algorithm Based On Density
10	Research And Application On Three-Decision KNN Algorithm Based On Incremental Learning