Research On Top-K Query Processing Of Incomplete Data

Posted on:2022-07-11

Degree:Master

Type:Thesis

Country:China

Candidate:C M Liang

Full Text:PDF

GTID:2518306572991249

Subject:Computer software and theory

Abstract/Summary:

PDF Full Text Request

With the continuous popularization and rapid development of information technologies,data are growing exponentially.While these data profoundly change people's production methods and lifestyles,they also bring many challenges,such as data quality problems.Because of the interference of various factors in the process of data acquisition and transmission,incomplete data widely exist in many fields.Incomplete data query processing has been deeply studied in data analysis,machine learning,and other fields.Among many queries,Top-K queries retrieve k objects users are most interested in.Existing Top-K query technologies for incomplete data are oriented to streaming data or based on crowdsourcing mode.Because there are essential differences between dynamic data and static data,and crowdsourcing technology requires specific conditions,Top-K query of incomplete static data cannot directly use the previous technologies.To solve this problem,this paper firstly defines the Top-k query problem of incomplete data and its dominating relationship,and then proposes an algorithm combining pruning strategy and filling strategy.Through the comparative experiments on bilibili Videos real data set and synthetic data sets,this paper evaluates the impact of data size and missing rate on the algorithm.Experimental results show that the proposed algorithm has better recall rate and less execution time than the similar algorithms without pruning strategy.When a Top-K query is performed on incomplete data,the expected tuples may not appear in the query results,that is,the "Why-Not" question.Answering "Why-Not" helps to improve the completeness and accuracy of the query results.The Why-Not question in Top-K query,spatial keyword Top-K query and reverse Top-K query have been extensively studied.However,these studies focus on complete data,so the existing methods cannot solve the Why-Not question of Top-K queries of incomplete data.In view of the above problem,this paper firstly defines the cost function,then adopts methods of adjusting the query and changing the filling value to make the missing expected tuples appear in the query result,and finds the solution with the least cost.A large number of experiments on the bilibili Videos data set and synthetic data sets show that the algorithms have high accuracy and efficiency.

Keywords/Search Tags:

Incomplete data, Top-K query, Why-Not question, Preference query, Data imputation

PDF Full Text Request

Related items

1	Research On SKYLINE Preference Query Technology Over Incomplete Data
2	Research Of Skyline Preference Query Based On Incomplete Dataset
3	Research And Application Of Incomplete Data Imputation Algorithm
4	Research And Implementation Of Spatial Text Data Query Processing Technology
5	Research On Multidimensional Data Query Processing Based On User Preferences
6	Research On The Compression-based Approximate Query Method For Massive Incomplete Data
7	The Study Of Key Technologies For Uncertain RFID Stream Data Management
8	Research On Incomplete Data Imputation In Sensor Networks
9	Attribute Correlation Modeling And Missing Value Imputation Of Incomplete Data Based On Fuzzy Partition
10	Studies On Technologies Of Flexible Query For XML