Font Size: a A A

Research On The Outlier Reduction Of Bridge Monitoring Data Based On Hadoop

Posted on:2017-06-07Degree:MasterType:Thesis
Country:ChinaCandidate:J J TanFull Text:PDF
GTID:2322330485981677Subject:Software engineering
Abstract/Summary:PDF Full Text Request
Bridge is an important part of road transportation,to ensure the safety of bridge is very important.Bridge health monitoring technology is a common management method,the monitoring data processing is one of the core content.As time goes by,the bridge monitoring system will accumulate more and more data,using traditional data processing technology to handle these massive data storage and processing is difficult.Hadoop is a popularbig data processingplatform,Hadoop distributed file storage system and MapReduce computing framework are two core of it,uniting with Hive,Sqoop tools as the branch,it has came to be abig data processing ecosystem.Using Hadoop platform to processthe large number of bridge monitoring data hasthe theoretical significance and realistic value.Data mining technology is a means of commonly used data processing,outlier mining is a hot research topic of it,at present outlier mining are used in many fields such as network intrusion detection and weather forecasts,but in the field of bridge monitoring,this research has not been enough attention.This paper mainly studies the outlier mining in bridge monitoring data based on Hadoop,the research content is reflected in the following aspects:At first,to overcome k-nearest neighbor outlier algorithm large overhead shortcomings,combined with partition theory,clustering method and minimum limit of matrix theory,this paper propose a based on K-means clustering partition k-nearest neighbor outlier algorithm KMKNN.Firstly using K-means clustering process to divide original data,then prune the partitions which didn't have outliers,at last using k-nearest neighbor outlier algorithm to find all outliers in the rest regions.Experiment shows that compareing with the original algorithm,the new algorithm can improve the efficiency of operation.Secondly,the disadvantage of KMKNN algorithm is that it needs to preset the number of cluster at first and the initial cluster centers are selected randomly,so the clustering results are not accurate.Therefore,combined with the canopy clustering and maximum minimum distance algorithm,this paper proposed a based on the canopy clustering,maximum minimum distance algorithm,and K-means clustering k-nearest neighbor algorithm CMM-KMKNN.Experiment shows thatnew algorithm can improve the accuracy of clustering and outlier mining.Thirdly,KMKNN and CMM-KMKNN algorithm needs a lot of data iteration operation,the operation cost is high.So this paper builed a Hadoop cluster,using Hadoop to achieve parallel KMKNN and CMM-KMKNN algorithm,and then dig the bridge data outlier mining.The experiment results show that parallel algorithm imcreased the data rocessing speed and the outlier mining accuracy.
Keywords/Search Tags:bridge monitoring, outlier mining, k-nearest neighbor algorithm, K-means clustering
PDF Full Text Request
Related items