Font Size: a A A

The Research And Application Of Massive Spatio-temporal Data Storage And Mining Methods Based On Cloud Computing

Posted on:2015-03-11Degree:MasterType:Thesis
Country:ChinaCandidate:L Q PingFull Text:PDF
GTID:2268330428965052Subject:Computer software and theory
Abstract/Summary:PDF Full Text Request
In recent years, the large amounts of spatio-temporal data has been collected and stored in distributed databases, making the increasing demand of the spatio-temporal data mining. Due to the sharp increase in traffic flow data and the significant spatial and temporal characteristics, there have been serious challenges in dealing with the vast amounts of spatio-temporal data in the field of public security traffic management. For the growing of massive data analysis, the traditional processing methods can not meet the needs of users on the storage space and computational efficiency. It should need to support massive data storage and analysis platform to adapt to the new demands.Spatio-temporal outlier detection is an important branch in the field of spatio-temporal data mining. In this paper, we design and implement a large data storage and analysis platform due to the limitations of traditional processing methods in terms of spatio-temporal outlier detection (ST-Outlier). The main research and achivements are as follows:(1) We first analyze and research the technical principles of Hadoop, HBase, Hive and Zookeeper under the cloud platform and study the principles of HDFS and Hadoop MapReduce programming model framework. Then we focus on the data storage architecture in the distributed database of HBase underlying the principle and the data model of HBase table. Finally, we construct the cloud platform based on Hadoop, HBase, Hive and Zookeeper, and build the expansion architecture system of HBase+Hive.(2) We deeply study the ST-Outlier detection methods and analyze some of the existing ST-Outlier detection patterns. By digging a predefined ST-Outlier pattern, the valuable knowledge can be obtained. Then we propose a new four-step ST-Outlier detection method (data preprocessing, distributed outlier detection methods, knowledge-rules applying, result verification) based on the cloud computing platform that can work effectively and efficiently. A real-world application of traffic data stream is used to validate the method.(3) By studying the HBase row key design, the HBase data model is proposed based on row keys. In a clear design goal, we design the distributed secondary index tables and the recovery tables based on row keys and implement a secondary distributed secondary index based on HBase applying to the traffic flow data applications. Experiments show that the indexing mechanism can efficiently achieve massive data queries.(4) Combining with the above research, we design and implement large data storage and analysis platform, including cloud platform, the background program and the foreground display system program. The platform integrates with the applications of the ST-Outlier detection facilitates users to easily operate and view the results.
Keywords/Search Tags:Data mining, Cloud computing, Traffic data flow, Spatio-temporal outlierdetection, Secondary indexing
PDF Full Text Request
Related items