Research On Top-k Query Algorithm For Uncertain Data

Posted on:2021-01-08

Degree:Master

Type:Thesis

Country:China

Candidate:C Zhang

Full Text:PDF

GTID:2428330623479540

Subject:Software engineering

Abstract/Summary:

PDF Full Text Request

With the continuous progress of society and the continuous development of computer field,the amount of data has shown an explosive growth trend.In order to get the key information from the massive data quickly and accurately,it is urgent to need efficient and low-cost query processing methods.However,due to the inaccuracy of the original data,the lack of processing value,meeting the special needs,the collection of coarse-grained data and protecting the privacy of users,a large number of uncertain data exist in various application fields.At present,the research and processing of uncertain data has become one of the research hotspots.Although the query processing of traditional deterministic data has reached maturity,the traditional algorithm is not suitable for uncertain data,and now the data is mostly in the form of parallel data streams in the big data environment.In the face of uncertain data streams,the traditional Top-k query algorithm can not solve the uncertain data streams well.At present,researchers have achieved many results,but the existing algorithms have some limitations in the face of uncertain data streams.Because of the characteristics of uncertain data streams,such as high flow rate,large amount of data,and the special probability of data,it can not only return the former part of data with the best sorting score,but also consider the data structure,the relationship between data and probability,so it is still very complex to deal with uncertain data streams.In view of the above problems,the main work of this thesis is as follows:1.In order to reduce the amount of computation,this thesis proposes a Top-k dominating query algorithm for uncertain data,which uses several effective pruning methods and dominating relationships.The algorithm first uses the possible world model to model the uncertain data,and obtains the calculation method of the Top-k probability according to the possible world examples.Then,it prunes the data by comparing the ranking score range,the probability threshold with the Top-k probability and the definition of the dominance relationship so as to reduce the follow-up work and improve the query efficiency.2.In this thesis,an improved parallel Top-k dominating algorithm is proposed for the query processing of parallel uncertain data streams.On the basis of the original algorithm,the operation steps of clustering,structure generation,dominating relationship and the relationship between threshold and data are added.In the process of transmission,the algorithm uses the method of recording the state of the data in the global candidate result set and the minimum time to reduce the calculation frequency of the dominant score of the data and reduce the repeated workload of the computer.After analyzing the characteristics of the algorithm,the experimental results show that compared with the existing work,the algorithm can save a lot of computing time,and can provide high-precision computing results in most cases.3.A prototype system to verify the feasibility of Top-k query algorithm based on parallel uncertain data streams is designed and implemented in this thesis.First,the implementation process and interface of the prototype system are described,and then the function of each module in the system is analyzed,including data preprocessing module,data pruning module,data transmission calculation module and the final result calculation output module.The prototype system designed in this thesis can well implement the Top-k query algorithm proposed in this thesis and has a good operability.

Keywords/Search Tags:

uncertain data streams, top-k query, pruning method, dominating algorithm

PDF Full Text Request

Related items

1	Reseach On K-Skyband Based Top-K Dominating Query Over Distributed Data Streams
2	Research On Algorithm Of Uncertain Data Flow Query Processing
3	Uncertain Clustering Method And Its Application In Data Streams Processing
4	Research On Skyline Query Over Uncertain Data Streams
5	Research On Distributed Parallel Skyline Query Processing Technology Over Uncertain Data Streams
6	ER-Topk Query Processing On Ucertain Streams
7	Research Of Skyline Query Processing On Uncertain Dataset Of WSNs
8	Top-k Complex Event Query Technology For Data Streams
9	Adaptive Processing Of Ad-hoc Queries On Data Streams
10	Research On Location And Textual Based Queries And Authentication Strategies