Font Size: a A A

Research On Approximate Query Algorithm For Real-time Analysis Of Massive Data

Posted on:2018-03-25Degree:MasterType:Thesis
Country:ChinaCandidate:S L NiFull Text:PDF
GTID:2358330512976805Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
Under the background of modem science and technology,the data generated from all kinds of information system are increasing rapidly,and the speed is becoming faster and faster,which generates a lot of streaming data.How to deal with the massive data quickly has been becoming a current hot research topic.Data stream belongs to a new data form,which has dynamic,fast-changing,unlimited and other characteristics.Traditional data processing can not calculate the data in data stream quickly and accurately.Under the background of data stream,this thesis studies data approximate query algorithm for data stream combined with the data mining technology.Combine stream processing and batch processing,and use improved sliding window to provide streaming approximate querying capability.The improved hierarchical algorithm is used to stratified sample the massive historical data generated from data stream,which can minimize the influence of the bias values for the query results.The main research works in this thesis are described below.(1)Use factor analysis method to classify attributes for the multidimensional attribute dataset,and reduce hierarchical dimension.A clustering optimization algorithm,HC-UPGMD,is proposed to improve the clustering results are used as the hierarchical basis,which is service for stratified sampling of historical data in subsequent approximate query model.(2)Put forward a stratified sampling based on the attenuation of weights allocation on the sliding window,which divides sliding window into several basic windows,and sets the weight of basic windows through attenuation function.Set the corresponding sampling ratio based on the weight and number of data elements of basic windows.(3)Put forward an approximate query model for massive data real-time analysis based on the above algorithms,and introduce the function of each module and corresponding implementation of the algorithm in detail.This thesis carries out experiments to verify above algorithms and models in the corresponding chapters.Experiments show that algorithms and models are practical and effective,and can be used in intelligent city and military large-scale data stream real-time analysis widely.
Keywords/Search Tags:Massive Data, Streaming Data, Approximate Query, Stratified Sampling, Stream Processing and Batch Processing
PDF Full Text Request
Related items