Font Size: a A A

Research On Skyline-Join Query Processing Of Incomplete Datasets With Crowdsourcing

Posted on:2021-03-04Degree:MasterType:Thesis
Country:ChinaCandidate:X ZhangFull Text:PDF
GTID:2428330611453097Subject:Computer system architecture
Abstract/Summary:PDF Full Text Request
Skyline query can return data that users are more interested in.It is one of the research hotspots in the field of database research,and it has been widely used in many fields such as flight query,product recommendation,and accommodation selection.As a variant of the skyline query,the skyline-join query can solve the skyline query problem on multiple datasets.Skyline-join queries are generally based on complete databases.However,due to the widespread use of automatic information extraction and aggregation,incomplete datasets have become a common phenomenon.When the attributes of the acquired information are missing,the existing incomplete dataset skyline-join query algorithm is generally processed based on probability or based on scoring with crowdsourcing.In order to better reflect the reality and be more user-centric,this paper proposes a crowdsourcing-based method to solve skyline-join queries of incomplete data.The main idea is to use crowdsourcing to infer pairwise preferences between tuples when the values of certain attributes in the tuples are unknown.Specifically,the proposed solution considers two key factors used in existing crowdsourcing-enabled algorithms,that is,by using the preference relationship of tuples on known attributes to minimize the cost of crowdsourcing and reduce the number of rounds of waiting time by processing the questions posed to the crowd in parallel.Skyline-join skyline-join queries with crowdsourcing are divided into two categories based on whether the skyline-join query includes attribute dimensions with known attribute values in the incomplete dataset,that is,partial skyline-join with crowdsourcing and all skyline-join with crowdsourcing.Partial skyline-join with crowdsourcing means that for two datasets of the skyline-join query,all attributes of one dataset are known,and one attribute of the other dataset is unknown,where the skyline-join query only involves unknown and connection attributes that require crowdsourcing.The all skyline-join with crowdsourcing contains not only unknown properties and connection properties that require crowdsourcing on the incomplete database,but also other properties with known property values.The main contents ofthis article are summarized as follows:(1)For part of the skyline-join query based on crowdsourcing,this paper proposes a part of the skyline-join query processing method based on crowdsourcing(Partial Skyline-Join with Crowdsourcing Algorithm,PSJCrowd).First,the BNL algorithm is used to filter datasets with known data attributes;second,the paired comparison championship algorithm is used to filter incomplete datasets;finally,the level-preference-tree-index is established based on the attribute preferences on the complete dataset filter global data and return skyline-join query results.(2)For all skyline-join query with crowdsourcing,this paper proposes a crowdsourcing-based global skyline-join query processing method.First,filter the known data set;second,on the incomplete data set,build a level-preference-tree-index based on the known attributes of the incomplete dataset;then,propose the All Skyline-Join with Crowdsourcing on single dataset(ASJCrowd-single)Algorithm to filter incomplete datasets;finally,based on the known attributes of incomplete datasets and complete datasets to build a global level-preference-tree-index,propose the All Skyline-Join with Crowdsourcing on multiple datasets(ASJCrowd-multiple)Algorithm to filters the connected tuples based on the global level-preference-tree-index and the results of each round of crowdsourcing.(3)The experiment proves the effectiveness of skyline-join algorithm on the incomplete dataset in reducing the cost of crowdsourcing and the delay of crowdsourcing.The experiment analyzes the running time of the algorithm from the aspects of the number of attribute dimensions and the change of the datasets.Crowdsourcing cost and crowdsourcing delay,and a comparative test with the CrowdSky algorithm on a single dataset,through the comparison experiment proved that the two algorithms have high performance,compared with the baseline method and CrowdSky algorithm,PSJCrowd algorithm and ASJCrowd algorithm have obvious advantages.
Keywords/Search Tags:Incomplete data, skyline query, skyline-join query, crowdsourcing
PDF Full Text Request
Related items