Font Size: a A A

Research On The Strategy Of Temporal Information Storage And Retrieve Based On Hadoop

Posted on:2015-03-22Degree:MasterType:Thesis
Country:ChinaCandidate:C Y FengFull Text:PDF
GTID:2268330428997263Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
With the further development of the information technology, large amounts of information have become a crucial component of analysis, processing and application in various fields and industries, and they also have become the most importance factor in decision making. Furthermore, almost all of the information contains temporal features explicitly or implicitly, so the discussion of storage and retrieval for the temporal information referred to a key problem. Researches so far focus on using the temporal relation database model based on the traditional relation database to store and process temporal data at a massive scale and high concurrency has met with a bottleneck, revealed many problems which were difficult to overcome. Reading and writing temporal data high concurrently aren’t easy to satisfied, nor processing large amounts of complicated and unstructured data. Scholars begin to concern about the application in Hadoop of which structure is distributed system. Hadoop is an open source cloud computing framework, it contains the features of large-scale extension and horizontal distributed, which could provide dynamic ability of storage and computing. It’s a new idea to store and quick retrieval massive temporal data.According to the large amounts of unstructured temporal information, it established a data storage model under the distributed environment, and put forward a basic method about temporal data processing. Used the distributed and unstructured databases HBase which is under the Hadoop platform to store temporal data, then built the temporal storage data model by temporal storage unit which is based on temporal set. And for the characteristics of distributed processing and data types of temporal set, it proposed an implementation method about the relational calculus of massive temporal information in the model of Map/Reduce. By extending relation calculation of temporal interval, achieved relational calculus such as intersect operation, union operation which using temporal set as basement temporal data processing object. It gave the research example of medical temporal data show the applicability of the proposed data storage model and relational calculus scheme under the distributed application system. For the needs of quick retrieval mass unstructured temporal information, designed a Multi-indexed Distributed Hash Table (tDHT) algorithm to realize the retrieval for the temporal attribute value of temporal column quickly and efficiently. By mapping from temporal attribute value to the two-dimensional space, achieved the conversion from temporal data to space object, divided temporal data area by using the processing method for spatial data, generated Multi level temporal data sub-areas, constructed the Multi-indexed DHT directory which is stored by HBase using the methodology of DHT.The innovations of the paper includes:(1) For the efficiency bottleneck met when storing massive unstructured temporal data with traditional database, constructed a massive temporal data storage model in HBase, designed a massive temporal information storage framework.(2) For the query and analysis operation for the temporal information in storage system, proposed an implementation method about the relational calculus of massive temporal information in the model of Map/Reduce, took the temporal set as operand to realize the temporal relational calculus, which contains the product of union operation, intersection operation and Cartesian.(3) For the needs of quick retrieval and high-efficiently indexing for massive temporal information, designed a Multi-indexed Distributed Hash Table (tDHT) to realize the retrieval for the temporal attribute value of temporal column efficiently and accurately.According to the designing scheme, in the last part of the paper, conducted a efficiency test to verify the corresponding data, experiment results show that the strategy and design proposed have well applicability in storage, query and retrieval for massive temporal information in cloud computing platform, and they also improve the processing ability of massive temporal data and also shows good performance.
Keywords/Search Tags:Temporal information, Hadoop, HBase, Storage Model, Temporal relationalcalculus, Index
PDF Full Text Request
Related items