Font Size: a A A

Research On Data Model Selection Methods For Complex Analysis Tasks Of Heterogeneous Data

Posted on:2019-01-08Degree:MasterType:Thesis
Country:ChinaCandidate:J P ZhangFull Text:PDF
GTID:2438330566973393Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
With the development of big data,the data grows rapidly and has a variety of formats.The limitation of traditional relational database in processing such data analysis tasks becomes more and more obvious.As a result,more data are stored as non-relational data,such as graph data,document data and Key-Value data.Therefore,complex analysis tasks of heterogeneous data are becoming more common.This type of analysis tasks perform queries in heterogeneous data conbination,which involves multiple data models.The heterogeneity of data is not only difficult in data maintenance and management,but also it is difficult to obtain optimal execution performance when performing analysis tasks,thereby consuming more resources.Therefore,it is necessary to unify heterogeneous data into the same data model for analysis tasks.How to determine the proper data model and unify the heterogeneous data involved in the query to achive higher analysis efficiency is an urgent problem need to be solved.The traditional solution is to integrate all the data together and take XML as the intermediate transformation model.Then the analysis task executes based on the XML format data.Due to the query nature of XML data,it is difficult to meet the demand for analysis of large amounts of data.Moreover,with the development of NoSQL database technology,it supports the conversion between relational data and non-relational data,which provides more options for our data storage.Therefore,we can completely execute the analysis task with a non-relational data storage model.We believe that there is an optimal data model for the heterogeneous data combinations involved in the analysis task.Unifying the heterogeneous data into this model,we can obtain the optimal analysis performance.In this paper,we proposed a selection method of data model based on cost estimation.The cost model is introduced into this method to estimate the execution cost of query tasks on different data models.The query cost is used as a criterion to measure the data model,and chooses a data model with the minimum query cost.The analysis task can obtain the optimal performance with this selected model.The main research contents of this paper is:(1)Studying the data model and query fetures involved in the heterogeneous data combination.(2)Designing and implementing the query cost estimation model based on non-relational data.(3)Comparing the query cost of relational data and non-relational data based on cost estimation and completing the selection of optimal data model in heterogeneous data combination.(4)Verifying the accuracy and validity of this method based on the data sets and query cases of BigBench benchmark.
Keywords/Search Tags:Non-relational data, Heterogeneous data, Cost estimation, Selection of data model, Optimal data model
PDF Full Text Request
Related items