Font Size: a A A

Research And Application Of Incomplete Data Imputation Algorithm Based On Subtractive Clustering

Posted on:2015-11-15Degree:MasterType:Thesis
Country:ChinaCandidate:L ZhaoFull Text:PDF
GTID:2298330467984593Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
In recent years, data are growing at an unprecedented rate with the deepening of network applications. However, the collected raw data usually have some missing values owing to the devices failures and network instability during data acquisition and transmission. And it is difficult to model and study the incomplete data. If the missing values are not treated effectively, especially in large datasets, the potential value of data can not be excavated completely. Thence, it is meaningful to analyze and impute the incomplete data.Generally speaking, traditional data imputation methods always use the whole datasets to fill in the missing values, and they ignore the clustering features of data objects, so the filled values are easily affected by unrelated data. Moreover, existing incomplete data imputation algorithms’ time complexities are pretty high, and they don’t have the characteristic of distributed processing. Therefore, they are not suitable for the processing requirement in big data environment.In connection with these problems, the paper mainly accomplishes following contents. Firstly, this paper proposes a novel incomplete data subtractive clustering algorithm based on MapReduce. It clusters incomplete data points directly by designing a new similarity metrics. And the matrix multiplication theory is used to compute the distances between different data points, so that the subtractive clustering can be paralleled by deriving Muti-MapReduce processes. Secondly, this paper proposes an incomplete data imputation algorithm based on distributed subtractive clustering. It is considered that the different attributes may affect the clustering results differently, so attribute weights are used to compute the distances between data points, and the factor of distance weights can be obtained. Then it uses the distance weights and clustering results to impute missing values without affecting by data points in other clusters. Finally, this paper builds a bridge monitoring simulation system based on the internet of things(IoT), then a kind of plug and play sensor module, a solar tracking system, and a stacked wireless module are designed to improve the processes of data acquisition and transmission. Consequently, it is convenient to acquire the big data of bridge monitoring system. After that, the algorithms as mentioned are used to impute the missing values.The simulation results demonstrate that this work can cluster the incomplete big data quickly, and impute the missing data effectively. Hence it can meet the requirement of big data processing ideally. Conclusively, this paper has played a positive role in promoting the research and development of big data technologies.
Keywords/Search Tags:Big Data, Subtractive Clustering, Data Imputation, Internet of Things, Bridges Monitoring
PDF Full Text Request
Related items