Font Size: a A A

A quantitative quality control model for parallel and distributed crowdsourcing tasks

Posted on:2015-08-14Degree:Ph.DType:Dissertation
University:University of Maryland, Baltimore CountyCandidate:Zhu, ShaojianFull Text:PDF
GTID:1478390017988955Subject:Information Science
Abstract/Summary:
Crowdsourcing is an emerging research area that has experienced rapid growth in the past few years. Although crowdsourcing has demonstrated its potential in numerous domains, several key challenges continue to hinder its application. One of the major challenges is quality control. How can crowdsourcing requesters effectively control the quality from the crowdsourcing workers? To address that challenge, a data-driven empirical model of quality control for crowdsourcing was designed to automatically assess the quality of an individual's contribution to a task, without much manual intervention or external data support. This model is designed to categorize the data from each crowdsourcing worker into one of several quality groups. The model was initiated by estimating thresholds for different quality groups based on analyzing the two categories of quantitative training data from tasks (i.e., user effort measures and task natures). Then the model integrated the expected variance within individual workers to adjust the initial estimates. These computed thresholds are then used to judge the quality of each user contribution. Two studies under different task domains were conducted to evaluate the model. The results from both studies support the effectiveness of the model. A comparison study was conducted between our model and the iterative voting approach, a commonly used quality judging method in crowdsourcing. The comparison study results confirmed the advantages of our model over iterative voting. A blacklist-based enhancement was added to the original model inspired by the Gold Standard method, to defend against gaming under the assumption that gamers will always cheat and never provide valid data inputs. A Java implementation of the quality judging model was shared as an open source package to allow an easy adoption of the designed model. Regarding theoretical contributions, this dissertation proposed a three-stage (i.e., training, refinement, and classification) quality judging model to automatically determine data quality based on two categories of quantitative measures for crowdsourcing tasks. Practically, the crowdsourcing community can directly use or build upon this work to control crowdsourcing data quality more effectively.
Keywords/Search Tags:Crowdsourcing, Quality, Model, Task, Data, Quantitative
Related items