Research On Data Model Selection Methods For Complex Analysis Tasks Of Heterogeneous Data

Posted on:2019-01-08

Degree:Master

Type:Thesis

Country:China

Candidate:J P Zhang

Full Text:PDF

GTID:2438330566973393

Subject:Computer Science and Technology

Abstract/Summary:

PDF Full Text Request

With the development of big data,the data grows rapidly and has a variety of formats.The limitation of traditional relational database in processing such data analysis tasks becomes more and more obvious.As a result,more data are stored as non-relational data,such as graph data,document data and Key-Value data.Therefore,complex analysis tasks of heterogeneous data are becoming more common.This type of analysis tasks perform queries in heterogeneous data conbination,which involves multiple data models.The heterogeneity of data is not only difficult in data maintenance and management,but also it is difficult to obtain optimal execution performance when performing analysis tasks,thereby consuming more resources.Therefore,it is necessary to unify heterogeneous data into the same data model for analysis tasks.How to determine the proper data model and unify the heterogeneous data involved in the query to achive higher analysis efficiency is an urgent problem need to be solved.The traditional solution is to integrate all the data together and take XML as the intermediate transformation model.Then the analysis task executes based on the XML format data.Due to the query nature of XML data,it is difficult to meet the demand for analysis of large amounts of data.Moreover,with the development of NoSQL database technology,it supports the conversion between relational data and non-relational data,which provides more options for our data storage.Therefore,we can completely execute the analysis task with a non-relational data storage model.We believe that there is an optimal data model for the heterogeneous data combinations involved in the analysis task.Unifying the heterogeneous data into this model,we can obtain the optimal analysis performance.In this paper,we proposed a selection method of data model based on cost estimation.The cost model is introduced into this method to estimate the execution cost of query tasks on different data models.The query cost is used as a criterion to measure the data model,and chooses a data model with the minimum query cost.The analysis task can obtain the optimal performance with this selected model.The main research contents of this paper is:(1)Studying the data model and query fetures involved in the heterogeneous data combination.(2)Designing and implementing the query cost estimation model based on non-relational data.(3)Comparing the query cost of relational data and non-relational data based on cost estimation and completing the selection of optimal data model in heterogeneous data combination.(4)Verifying the accuracy and validity of this method based on the data sets and query cases of BigBench benchmark.

Keywords/Search Tags:

Non-relational data, Heterogeneous data, Cost estimation, Selection of data model, Optimal data model

PDF Full Text Request

Related items

1	Research On Heterogeneous Data Exchange Model Based On XML
2	Research On Open Scalable Relational Data Model And Data Partition Strategy
3	Research And Implementation Of Heterogeneous Data Conversion Based On Xml Technologies
4	Research And Implementation Of Heterogeneous Data Conversion Based On XML Technologies
5	Design And Implementation Of An Adaptive Storage System Oriented At Heterogeneous Big Data Of Power Grid
6	The Design And Realization Of Base Data Manage System Using Joint C3 Information Exchage Data Model
7	Research On The Application Of Heterogenous Data Source Integration In University Data Center Project
8	Multidimensional Data Modeling Method Based On The Relational Data Model
9	Research On Methods Of Learning Statistical Relational Model
10	Data Warehouse Model Design And Data Analysis For Cost Control