Research And Implementation Of Probability Query Approach For Big Data

Posted on:2018-01-30

Degree:Master

Type:Thesis

Country:China

Candidate:J B Wu

Full Text:PDF

GTID:2428330542988040

Subject:Software engineering

Abstract/Summary:

PDF Full Text Request

The highly integration of the ternary human-cyber-physical universe caused the explosion of data size and the complexity of data model,the network era of big data is coming.Big data has underestimated value in commerce,which is directly reflected in the prevalence of artificial intelligence,machine learning and deep learning technology.The most important task of mining the valuable information is how to get the valid data from tons of data that is alse necessary for many data application systems,and that's the main point in big data query processing.As the one of main issues,big data processing has always been highly concerned since it born.With the development and mature of big data technology,led by Hadoop ecosystems,more and more big data query systems appeared on the market,such as Hive,Spark SQL,Dremel,BlinkDB and so on.These systems can be divided into two groups:the accurate and the approximate,which are different in query optimization.The former is highly dependent on parallel processing to optimize query,the latter is also relying on traditional approximate query technique,such as stratified sampling,besides parallel processing and storage.This thesis mainly researches big data query optimization,from the view of incomplete query.A probabilistic query approach in big data is proposed in this thesis.The main solution is improving query performance by reducing the possibility of getting the complete data.That is"recall at confidence" in this thesis,the confidence is the degree of reliability and the recall is the possibility of getting the complete data.Actually,confidence is the precondition of getting the satisfied data.In this thesis,probabilistic query model was defined firstly,which described the organization of data on in logically and physically,and the mapping relations between them.Secondly,the method for storing and inquiring data was designed;the probability distribution algorithm and probabilistic query algorithm were also proposed.Thirdly,prototype system Probery of probabilistic query was designed and finished.Lastly,the experiments were designed to confirm the rationality and validity of probabilistic query;the application value of this research was exposed fully by the comparison.Data query has been widely used in a wide variety of applications,and probabilistic query proposed in this thesis provides a new idea to query,which can be helpful to choose the way to query reasonably based on different requirements,and improved the query performance.At the same time,this research is good to promote the development and application of big data query technology in the field of probabilistic query.

Keywords/Search Tags:

Big Data, Query Optimization, Probability Query, Recall, Confidence

PDF Full Text Request

Related items

1	Research On Distributed Query Processing And Optimization Of RDF Data
2	Research On Distributed Query Optimization And Implementation Of Data Governance Platform
3	Multimedia Retrieval System Based On Content
4	Design And Realization Of Optimized Query Strategy About Multi-Tenant Saas Based Application
5	Research On Data Query Processing And Optimization In Distributed Database
6	Query Processing And Optimization Over Various Types Of Streaming Data
7	Research On Techniques And Systems For Index And Query Optimization Of Big Data
8	A Study Of XML Data Based On Query Optimization
9	Query Processing And Optimization In Heterogeneous Information Integration
10	Keyword Query For RDF Data Based On Query Translation