Research On The Strategy Of Temporal Information Storage And Retrieve Based On Hadoop

Posted on:2015-03-22

Degree:Master

Type:Thesis

Country:China

Candidate:C Y Feng

Full Text:PDF

GTID:2268330428997263

Subject:Computer application technology

Abstract/Summary:

With the further development of the information technology, large amounts of information have become a crucial component of analysis, processing and application in various fields and industries, and they also have become the most importance factor in decision making. Furthermore, almost all of the information contains temporal features explicitly or implicitly, so the discussion of storage and retrieval for the temporal information referred to a key problem. Researches so far focus on using the temporal relation database model based on the traditional relation database to store and process temporal data at a massive scale and high concurrency has met with a bottleneck, revealed many problems which were difficult to overcome. Reading and writing temporal data high concurrently arenâ€™t easy to satisfied, nor processing large amounts of complicated and unstructured data. Scholars begin to concern about the application in Hadoop of which structure is distributed system. Hadoop is an open source cloud computing framework, it contains the features of large-scale extension and horizontal distributed, which could provide dynamic ability of storage and computing. Itâ€™s a new idea to store and quick retrieval massive temporal data.According to the large amounts of unstructured temporal information, it established a data storage model under the distributed environment, and put forward a basic method about temporal data processing. Used the distributed and unstructured databases HBase which is under the Hadoop platform to store temporal data, then built the temporal storage data model by temporal storage unit which is based on temporal set. And for the characteristics of distributed processing and data types of temporal set, it proposed an implementation method about the relational calculus of massive temporal information in the model of Map/Reduce. By extending relation calculation of temporal interval, achieved relational calculus such as intersect operation, union operation which using temporal set as basement temporal data processing object. It gave the research example of medical temporal data show the applicability of the proposed data storage model and relational calculus scheme under the distributed application system. For the needs of quick retrieval mass unstructured temporal information, designed a Multi-indexed Distributed Hash Table (tDHT) algorithm to realize the retrieval for the temporal attribute value of temporal column quickly and efficiently. By mapping from temporal attribute value to the two-dimensional space, achieved the conversion from temporal data to space object, divided temporal data area by using the processing method for spatial data, generated Multi level temporal data sub-areas, constructed the Multi-indexed DHT directory which is stored by HBase using the methodology of DHT.The innovations of the paper includes:(1) For the efficiency bottleneck met when storing massive unstructured temporal data with traditional database, constructed a massive temporal data storage model in HBase, designed a massive temporal information storage framework.(2) For the query and analysis operation for the temporal information in storage system, proposed an implementation method about the relational calculus of massive temporal information in the model of Map/Reduce, took the temporal set as operand to realize the temporal relational calculus, which contains the product of union operation, intersection operation and Cartesian.(3) For the needs of quick retrieval and high-efficiently indexing for massive temporal information, designed a Multi-indexed Distributed Hash Table (tDHT) to realize the retrieval for the temporal attribute value of temporal column efficiently and accurately.According to the designing scheme, in the last part of the paper, conducted a efficiency test to verify the corresponding data, experiment results show that the strategy and design proposed have well applicability in storage, query and retrieval for massive temporal information in cloud computing platform, and they also improve the processing ability of massive temporal data and also shows good performance.

Keywords/Search Tags:

Temporal information, Hadoop, HBase, Storage Model, Temporal relationalcalculus, Index

Related items

1	Research On Storage And Query Processing Of Spatio-temporal Data Based On HBase
2	Research On Spatio-temporal Index Model And Retrieval Methods Based On HBase
3	A Research Of Spatio-Temporal Object Query Processing Technology Oriented To Column Storage Model
4	Design And Implementation Of Temporal RDF Storage System Based On TimeDB
5	Research On Temporal RDF Model Index
6	Research On Index And Storage Of Spatio-temporal XML Database
7	Research On Index And Query Technology Of Spatio-temporal Data Based On Hadoop
8	Research And Implementation Of A Segmentation Hybrid Temporal Index Structure In Database
9	Research On Consistency And Index Based On B Tree Of Temporal XML
10	Studies On Construction And Storage Of Temporal RDF(S)