Font Size: a A A

Research On Annotation Quality Control In Crowdsourcing System

Posted on:2021-03-17Degree:MasterType:Thesis
Country:ChinaCandidate:J Z TuFull Text:PDF
GTID:2428330611964274Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
With the emergency of crowdsourcing systems,tasks that are difficult for computer,but comparatively easy for online workers,e.g.,sequence alignment,sentiment analysis and images annotation,are successful addressed with crowdsourcing.A large number of traditional time-consuming and highly costly annotating tasks that used to be conducted by experts are now transited to cost-effective crowdsourced annotating,which speeds up data updating and promotes the development of machine learning and data mining.However,due to the uncertainty of online worker's label-quality,data collected from crowdsourcing is often noisy,or even incorrect.Therefore,how to control the quality of crowdsourcing annotation has important research value and wide application.In this paper,we focus on the widespread problems of quality-control,respectively from the perspective of multi-label crowd consensus,task assignment strategy and crowdsourcing with active learning.On this basis,we creatively explore the attention-aware answer of the crowd.The main work is as follows:1)The research on Multi-label Answer Aggregation in Crowdsourcing: When acquiring labels from crowdsourcing platforms,a task may be associated with multiple labels,which is the so-called multi-label annotation.Most of the existing methods generally focus on single-label(multi-class and binary)tasks,and they ignore the inter-correlation between labels,and thus may have compromised quality.To mitigate this issue,we introduce a Multi-Label answer aggregation approach based on Joint Matrix Factorization(ML-JMF).ML-JMF selectively and jointly factorizes the sample-label association matrices collected from different annotators into products of individual and shared low-rank matrices.As such,it takes advantage of the robustness of low-rank matrix approximation to noise,and reduces the impact of unreliable annotators by assigning small(zero)weights to their annotation matrices.In addition,it takes advantage of the correlation among labels by leveraging the shared low-rank matrix,and of the similarity between annotators using the individual low-rank matrices to guide the factorization.ML-JMF pursues the low-rank matrices via a unified objective function,and introduces an iterative technique to optimize it.ML-JMF finally uses the optimized low-rank matrices and weights to infer the ground-truth labels.The experimental results on five real-world datasets show that ML-JMF can identify unreliable annotators even spammers and achieve high-quality aggregations.2)The research on the Task Assignment Strategy in Crowdsourcing: it is desirable to wisely assign the appropriate task to the right workers,so the overall annotation quality is maximized whilst the cost is reduced.In the process of completing the crowdsourcing task,the feature of task itself often affect the decision process of the worker.Existing task assignment strategies ignore it or only focus on the impact of task feature on the label aggregation.To solve the problems,we propose a novel task assignment strategy(CrowdWT)to capture the complex interactions between tasks and workers,and properly assign tasks to workers.CrowdWT first develops a Worker Bias Model(WBM)to jointly model the worker's bias,the ground truths of tasks,and the task features.WBM constructs a mapping between task features and worker annotations to dynamically assign the task to a group of workers,who are more likely to give correct annotations for the task.CrowdWT further introduces a Task Difficulty Model(TDM),which builds a Kernel ridge regressor based on task features to quantify the intrinsic difficulty of tasks and thus to assign the difficult tasks to more reliable workers.Finally,CrowdWT combines WBM and TDM into a unified model to dynamically assign tasks to a group of workers,recall more reliable even expert workers to annotate the difficult tasks.Our experimental results show that CrowdWT achieves high-quality answers within a limited budget,and has the best performance against competitive methods.3)The research on the Crowdsourcing with Active Learning: most multi-label answer aggregation methods ignore that crowd workers with different expertise are paid for their service,and the task requester usually has a limited budget.How to collect reliable annotations for multi-label data and how to compute the consensus within budget is a rarely studied problem.To address the problem,we propose a novel approach to accomplish Active Multi-label Crowd Consensus(AMCC).AMCC accounts for the commonality and individuality of workers,and assumes that workers can be organized into different groups.Each group includes a set of workers who share a similar annotation behavior and label correlations.To achieve an effective multi-label consensus,AMCC models workers' annotations via a linear combination of commonality and individuality,and reduces the impact of unreliable workers by assigning smaller weights to the group.To collect reliable annotations with reduced cost,AMCC introduces an active crowdsourcing learning strategy that selects sample-label-worker triplets.In a triplet,the selected sample and label are the most informative for the consensus model,and the selected worker can reliably annotate the sample with low cost.Our experimental results on multi-label datasets demonstrate the advantages of AMCC over state-of-the-art solutions on computing crowd consensus and on reducing the budget by choosing cost-effective triplets.4)The research on the Process of Attention-Aware Annotations of the Crowd: the solutions on the quality-control in crowdsourcing all assume that workers' label quality is stable over time.In practice,workers' attention level changes over time,and the ignorance of which can affect the reliability of the annotations.For that,we focus on a novel and realistic crowdsourcing scenario--attention-aware annotations.We propose a new probabilistic model that takes into account workers' attention to estimate the label quality.Expectation propagation is adopted for efficient Bayesian inference of our model,and a generalized Expectation Maximization algorithm is derived to estimate both the ground truth of all tasks and the label-quality of each individual crowd worker with attention.In addition,the number of tasks best suited for a worker is estimated according to changes in attention.Experiments demonstrate that our method quantifies the relationship between workers' attention and label-quality on the given tasks,and improves the aggregated labels.
Keywords/Search Tags:Crowdsourcing, Multi-label Answer Aggregation, Active Learning, Task Assignment, Attention
PDF Full Text Request
Related items