Font Size: a A A

Research On Top-k Query Algorithm For Uncertain Data

Posted on:2021-01-08Degree:MasterType:Thesis
Country:ChinaCandidate:C ZhangFull Text:PDF
GTID:2428330623479540Subject:Software engineering
Abstract/Summary:PDF Full Text Request
With the continuous progress of society and the continuous development of computer field,the amount of data has shown an explosive growth trend.In order to get the key information from the massive data quickly and accurately,it is urgent to need efficient and low-cost query processing methods.However,due to the inaccuracy of the original data,the lack of processing value,meeting the special needs,the collection of coarse-grained data and protecting the privacy of users,a large number of uncertain data exist in various application fields.At present,the research and processing of uncertain data has become one of the research hotspots.Although the query processing of traditional deterministic data has reached maturity,the traditional algorithm is not suitable for uncertain data,and now the data is mostly in the form of parallel data streams in the big data environment.In the face of uncertain data streams,the traditional Top-k query algorithm can not solve the uncertain data streams well.At present,researchers have achieved many results,but the existing algorithms have some limitations in the face of uncertain data streams.Because of the characteristics of uncertain data streams,such as high flow rate,large amount of data,and the special probability of data,it can not only return the former part of data with the best sorting score,but also consider the data structure,the relationship between data and probability,so it is still very complex to deal with uncertain data streams.In view of the above problems,the main work of this thesis is as follows:1.In order to reduce the amount of computation,this thesis proposes a Top-k dominating query algorithm for uncertain data,which uses several effective pruning methods and dominating relationships.The algorithm first uses the possible world model to model the uncertain data,and obtains the calculation method of the Top-k probability according to the possible world examples.Then,it prunes the data by comparing the ranking score range,the probability threshold with the Top-k probability and the definition of the dominance relationship so as to reduce the follow-up work and improve the query efficiency.2.In this thesis,an improved parallel Top-k dominating algorithm is proposed for the query processing of parallel uncertain data streams.On the basis of the original algorithm,the operation steps of clustering,structure generation,dominating relationship and the relationship between threshold and data are added.In the process of transmission,the algorithm uses the method of recording the state of the data in the global candidate result set and the minimum time to reduce the calculation frequency of the dominant score of the data and reduce the repeated workload of the computer.After analyzing the characteristics of the algorithm,the experimental results show that compared with the existing work,the algorithm can save a lot of computing time,and can provide high-precision computing results in most cases.3.A prototype system to verify the feasibility of Top-k query algorithm based on parallel uncertain data streams is designed and implemented in this thesis.First,the implementation process and interface of the prototype system are described,and then the function of each module in the system is analyzed,including data preprocessing module,data pruning module,data transmission calculation module and the final result calculation output module.The prototype system designed in this thesis can well implement the Top-k query algorithm proposed in this thesis and has a good operability.
Keywords/Search Tags:uncertain data streams, top-k query, pruning method, dominating algorithm
PDF Full Text Request
Related items