Font Size: a A A

The Parallel Stream Data Clustering Algorithm And The Application In Mining Of Traffic HotSpot

Posted on:2019-02-11Degree:MasterType:Thesis
Country:ChinaCandidate:S J GaoFull Text:PDF
GTID:2322330542489085Subject:Management Science and Engineering
Abstract/Summary:PDF Full Text Request
With the massive increase of data such as traffic and GPS monitoring,the traffic real-time delay and inaccurate traffic prediction are arised.So,the processing of large-scale real-time data is needed higher requirements and discovering traffic hotspot area has become more popular.Although the research on the clustering of traffic data has made remarkable achievements,there are still poor real-time performance,clustering flexibility and simple shape.Therefore,in order to discovering traffic hotspot more rapidly and real-timing,a fast two-stage framework for stream data clustering is proposed and distributed in the Storm environment.The two-stage framework is used for streaming data.That is,the Canopy algorithm is improved to generate macro clusters in the online stage;Kmeans algorithm and the sliding time window are combined to generate high-precision clustering results in the offline stage.At the same time,in order to increase the real-time performance of CK algorithm,the PCK algorithm is proposed and achieved.In order to verify the performance of PCK algorithm,the test dataset is clusterd by PCK、CK and Kmeans for comparision of accuracy,execution time and scalability.In order to improve the feasibility of mining for traffic hotspots,the seven-day taxi positioning data in Beijing is selected as the source data and runned by the PCK algorithm,finally,the result is showed in the thermal map visually.The result shows that the hot lines of more frequent taxi activities is consistent with the daily travel experience and the set of micro-clusters achieved real-time query of traffic conditions in any time window.It is feasible to use PCK to find the hotspot area and provide reference for real-time traffic scheduling,real-time dispatch of goods in logistics park and so on.
Keywords/Search Tags:Traffic HotSpot, Real-time Mining, Stream Data Clustering, PCK Algorithm, Storm
PDF Full Text Request
Related items