Font Size: a A A

Research On Top-K Query Processing Of Incomplete Data

Posted on:2022-07-11Degree:MasterType:Thesis
Country:ChinaCandidate:C M LiangFull Text:PDF
GTID:2518306572991249Subject:Computer software and theory
Abstract/Summary:PDF Full Text Request
With the continuous popularization and rapid development of information technologies,data are growing exponentially.While these data profoundly change people's production methods and lifestyles,they also bring many challenges,such as data quality problems.Because of the interference of various factors in the process of data acquisition and transmission,incomplete data widely exist in many fields.Incomplete data query processing has been deeply studied in data analysis,machine learning,and other fields.Among many queries,Top-K queries retrieve k objects users are most interested in.Existing Top-K query technologies for incomplete data are oriented to streaming data or based on crowdsourcing mode.Because there are essential differences between dynamic data and static data,and crowdsourcing technology requires specific conditions,Top-K query of incomplete static data cannot directly use the previous technologies.To solve this problem,this paper firstly defines the Top-k query problem of incomplete data and its dominating relationship,and then proposes an algorithm combining pruning strategy and filling strategy.Through the comparative experiments on bilibili Videos real data set and synthetic data sets,this paper evaluates the impact of data size and missing rate on the algorithm.Experimental results show that the proposed algorithm has better recall rate and less execution time than the similar algorithms without pruning strategy.When a Top-K query is performed on incomplete data,the expected tuples may not appear in the query results,that is,the "Why-Not" question.Answering "Why-Not" helps to improve the completeness and accuracy of the query results.The Why-Not question in Top-K query,spatial keyword Top-K query and reverse Top-K query have been extensively studied.However,these studies focus on complete data,so the existing methods cannot solve the Why-Not question of Top-K queries of incomplete data.In view of the above problem,this paper firstly defines the cost function,then adopts methods of adjusting the query and changing the filling value to make the missing expected tuples appear in the query result,and finds the solution with the least cost.A large number of experiments on the bilibili Videos data set and synthetic data sets show that the algorithms have high accuracy and efficiency.
Keywords/Search Tags:Incomplete data, Top-K query, Why-Not question, Preference query, Data imputation
PDF Full Text Request
Related items