Font Size: a A A

Research And Implementation Of Probability Query Approach For Big Data

Posted on:2018-01-30Degree:MasterType:Thesis
Country:ChinaCandidate:J B WuFull Text:PDF
GTID:2428330542988040Subject:Software engineering
Abstract/Summary:PDF Full Text Request
The highly integration of the ternary human-cyber-physical universe caused the explosion of data size and the complexity of data model,the network era of big data is coming.Big data has underestimated value in commerce,which is directly reflected in the prevalence of artificial intelligence,machine learning and deep learning technology.The most important task of mining the valuable information is how to get the valid data from tons of data that is alse necessary for many data application systems,and that's the main point in big data query processing.As the one of main issues,big data processing has always been highly concerned since it born.With the development and mature of big data technology,led by Hadoop ecosystems,more and more big data query systems appeared on the market,such as Hive,Spark SQL,Dremel,BlinkDB and so on.These systems can be divided into two groups:the accurate and the approximate,which are different in query optimization.The former is highly dependent on parallel processing to optimize query,the latter is also relying on traditional approximate query technique,such as stratified sampling,besides parallel processing and storage.This thesis mainly researches big data query optimization,from the view of incomplete query.A probabilistic query approach in big data is proposed in this thesis.The main solution is improving query performance by reducing the possibility of getting the complete data.That is"recall at confidence" in this thesis,the confidence is the degree of reliability and the recall is the possibility of getting the complete data.Actually,confidence is the precondition of getting the satisfied data.In this thesis,probabilistic query model was defined firstly,which described the organization of data on in logically and physically,and the mapping relations between them.Secondly,the method for storing and inquiring data was designed;the probability distribution algorithm and probabilistic query algorithm were also proposed.Thirdly,prototype system Probery of probabilistic query was designed and finished.Lastly,the experiments were designed to confirm the rationality and validity of probabilistic query;the application value of this research was exposed fully by the comparison.Data query has been widely used in a wide variety of applications,and probabilistic query proposed in this thesis provides a new idea to query,which can be helpful to choose the way to query reasonably based on different requirements,and improved the query performance.At the same time,this research is good to promote the development and application of big data query technology in the field of probabilistic query.
Keywords/Search Tags:Big Data, Query Optimization, Probability Query, Recall, Confidence
PDF Full Text Request
Related items