Font Size: a A A

Key Techniques Of Crowdsourced Query Processing

Posted on:2016-06-30Degree:DoctorType:Dissertation
Country:ChinaCandidate:J H FengFull Text:PDF
GTID:1318330536450229Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
The complexity and diversity of data brings great challenges to query processing.It is usually hard to achieve high quality of processing big data with machine alone. In recent years, with the rapid development of crowdsourcing technology, leveraging crowdsourcing for query processing has become a promising research topic. As the general public completes crowdsourcing tasks, crowdsourced query processing is faced with two challenges. Firstly, workers on the crowdsourcing platform can get financial rewards after providing answers. Consider the massive data, how to effectively eliminate unnecessary crowdsourcing questions in order to save human cost. Secondly, current crowdsourcing platforms cannot guarantee the quality of answers returned by workers. How to process answers to achieve high-quality results. To address these challenges, this paper proposes effective methods for crowdsourced query processing. The main contributions of this paper are summarized as follows.1. Adaptive crowdsourced join with multiple attributes: The existing approaches obtain crowdsourcing questions in relatively large quantities and may loss true matching pairs. To address this problem, this paper proposes a hybrid approach, which analyzes attributes and combines category, sorting and clustering techniques for not only filtering non-matching records as many as possible but also remaining true matching records. We also design an adaptive attribute-selection strategy, which can adapt to the changes in crowdsourcing tasks design so as to achieve high quality of join results and low cost. In addition, this paper devises a weighted vote method to integrate answers, which can help task requesters get results with high quality.2. Crowdsourced query optimization for selection query with multiple predicates:Traditional query optimization technology cannot process selection query with multiple predicates in crowdsourcing. To address this problem, this paper proposes a samplingbased framework, which processes sampling objects to find a high-quality predicate order and then the rest of objects are asked by adopting this predicate order. It can significantly reduce the cost. Existing methods do not take into account the cost of order generation.In order to reduce the cost, we devise a random-based selection method by randomly selecting the predicate order. Since the low-quality permutations may lead to the large cost, we propose a filtering-based algorithm, which selects permutations with the help of the predicate selectivity.3. Incremental answers integration in crowdsourcing: Current answers integration methods cannot consider the quality and efficiency at the same time. This paper proposes an incremental answers integration framework(Inquire). On the one hand, in this framework we propose a question model and develop two incremental strategies to combine the worker's quality to instantly compute the question's result. On the other hand, in order to improve the results' accuracy, this paper proposes a new worker model and devises an effective strategy of updating the worker model to accurately quantify the workers' quality.
Keywords/Search Tags:Crowdsourcing, Join Operation, Multiple Predicates, Query Optimization, Answers Integration
PDF Full Text Request
Related items