Multi-Label Crowdsourcing Learning

Posted on:2019-03-31

Degree:Doctor

Type:Dissertation

Country:China

Candidate:S Y Li

Full Text:PDF

GTID:1368330572465064

Subject:Computer application technology

Abstract/Summary:

PDF Full Text Request

Dealing with examples associated with multiple labels,multi-label learning(MLL)has received significant attention.Conventional MLL require groundtruth labels for learning,which are expensive and limited resources.In contrast,crowdsourcing pro-vides an alternative way to collect labeling information by distributing the tasks to mul-tiple easy to access and low cost workers.This paper studies multi-label crowdsourcing learning(MLC)from the following aspects:1.MLC considering label correlations and crowds' expertises variance.In MLC,the annotations on the one hand contain errors,on the other hand,their quantity and quality would sensitively affect the estimation of label correlations.We propose NAM which models the crowds' labeling accuracy on each label and considers local annotations' correlations.Based on the idea that instances similar in the feature space should also get similar annotations,we exploit information from feature space to help augment the label correlation estimation,and considers the local influence of neighborhoods' annotations.Considering that labeling budget is often limited,we also extend NAM to active crowdsourcing which saves the annotation amount significantly.Experiments validate the effectiveness of our proposals.2.MLC considering label correlations and crowds' specific labeling behavior.Due to the heavy workload of examining every label,or uncertainty about concepts,we note that the crowds tend to acting in an effort-saving annotating behavior,i.e.,rather than carefully annotating every proper label,the crowds would prefer scan?ning and tagging a few most relevant labels from their point of view and leave the rest untouched.We propose RAM which treats the tagged labels as more relevant than the other labels,and models each worker's expertise as its ability to distin-guish the correct relevance between label pairs,which also naturally captures the relevance comparison relationship between labels.We also extend RAM to active crowdsourcing learning.Experiments validate the effectiveness of the proposals.3.Fast MLC with incomplete annotations.While carefully checking and tagging all labels is rather heavy and even results in unexpected crowds' behavior anad la-beling error,we consider the learning case which requires much less labeling efforts from crowds,i.e.,we just require the crowds tagging partial labels for the instances and learn from the incomplete annotations.We propose the CRIA method.Based on the global low-rank structure between the wokrers,intances,and labels,CRIA first estimates the complete annotations and then aggregates over them.By using the well developed highly efficient matrix packages for optimization and using vot-ing methods for aggregation,CRIA is far superior to previous work in terms of both performance and efficiency.We also extend CRIA to active crowdsourcing learning.Experiments validate the effectiveness of the proposals.4.Bad Worker Detection in MLC.Spammers and adverserial workers not only cause waste of labeling budget,but also degrade the overall quality of the annotations.We propose the Worker Ana method to detect such bad workers.In the absence of crowds' feature information and only a small amount of annotations available,based on the idea that the good workers and adverserial workers should form two separate clusters,and the spammer workers act like outliers,we learn the workers'representation in the latent subspace and conduct worker analysis.Experiments validate the effectiveness of WorkerAna.We also extend the idea of WorkerAna to partial view clustering which can deal with multi-view data with incomplete views.

Keywords/Search Tags:

machine learning, multi-label learning, crowdsourcing, label correlation, worker expertise, incomplete annotations, active crowdsourcing, worker analysis

PDF Full Text Request

Related items

1	Online Task Assignment In Crowdsourcing Based On Worker Proficiency
2	Label Aggregation In Crowdsourcing
3	Research On Annotation Quality Control In Crowdsourcing System
4	A Worker Recommendation Mechanism With High Acceptance Rates In Crowdsourcing Systems
5	The Research Of Relevant Theory And Techniques For Spatial Crowdsourcing
6	Research On Machine Learning Algorithms For Data With Multiple Annotations
7	Research On Multi-label Active Learning Based On Label Correlation
8	Multi-Agent Based Optimization For Crowdsourcing System Considering Worker Characteristics
9	Research On Worker Recruitment And Task Assignment Mechanisms In Spatial Crowdsourcing Systems
10	Research On Noisy Label Based Machine Learning Methods Through Exploiting Crowdworker Feature