Research And Application Of Incomplete Data Imputation Algorithm Based On Subtractive Clustering

Posted on:2015-11-15

Degree:Master

Type:Thesis

Country:China

Candidate:L Zhao

Full Text:PDF

GTID:2298330467984593

Subject:Computer application technology

Abstract/Summary:

PDF Full Text Request

In recent years, data are growing at an unprecedented rate with the deepening of network applications. However, the collected raw data usually have some missing values owing to the devices failures and network instability during data acquisition and transmission. And it is difficult to model and study the incomplete data. If the missing values are not treated effectively, especially in large datasets, the potential value of data can not be excavated completely. Thence, it is meaningful to analyze and impute the incomplete data.Generally speaking, traditional data imputation methods always use the whole datasets to fill in the missing values, and they ignore the clustering features of data objects, so the filled values are easily affected by unrelated data. Moreover, existing incomplete data imputation algorithms’ time complexities are pretty high, and they don’t have the characteristic of distributed processing. Therefore, they are not suitable for the processing requirement in big data environment.In connection with these problems, the paper mainly accomplishes following contents. Firstly, this paper proposes a novel incomplete data subtractive clustering algorithm based on MapReduce. It clusters incomplete data points directly by designing a new similarity metrics. And the matrix multiplication theory is used to compute the distances between different data points, so that the subtractive clustering can be paralleled by deriving Muti-MapReduce processes. Secondly, this paper proposes an incomplete data imputation algorithm based on distributed subtractive clustering. It is considered that the different attributes may affect the clustering results differently, so attribute weights are used to compute the distances between data points, and the factor of distance weights can be obtained. Then it uses the distance weights and clustering results to impute missing values without affecting by data points in other clusters. Finally, this paper builds a bridge monitoring simulation system based on the internet of things(IoT), then a kind of plug and play sensor module, a solar tracking system, and a stacked wireless module are designed to improve the processes of data acquisition and transmission. Consequently, it is convenient to acquire the big data of bridge monitoring system. After that, the algorithms as mentioned are used to impute the missing values.The simulation results demonstrate that this work can cluster the incomplete big data quickly, and impute the missing data effectively. Hence it can meet the requirement of big data processing ideally. Conclusively, this paper has played a positive role in promoting the research and development of big data technologies.

Keywords/Search Tags:

Big Data, Subtractive Clustering, Data Imputation, Internet of Things, Bridges Monitoring

PDF Full Text Request

Related items

1	Indoor Environment Data Acquisition And Monitoring System Based On Internet Of Things And Machine Learning
2	Design And Implementation Of A Factory Production Data Monitoring System Based On The Internet Of Things
3	Study On Big Data Cluster Analysis Method And Technology Of Internet Of Things
4	Research On Wireless Link Data Format And Network Topology For Low-energy Internet Of Things
5	Research On Data Cleaning Based On Clustering
6	Studies On Missing Data Imputation
7	Research On Key Techniques Of Internet Of Things Remote Monitoring Based On Mobile Mode
8	Research And Application Of Data Visualization On Internet Of Things Monitoring
9	Research On Hybrid Algorithm Based On Subtractive Clustering
10	The Analysis And Improvement Research Of Knn-imputation Algorithm