Font Size: a A A

Label Aggregation In Crowdsourcing

Posted on:2021-06-25Degree:DoctorType:Dissertation
Country:ChinaCandidate:L A YinFull Text:PDF
GTID:1488306503982239Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
Label aggregation in crowdsourcing aims at inferring true labels for objects given labels or information from different labelers or information sources.This thesis proposes novel and effective aggregation algorithms from three aspects:only using labels from sources,utilizing both labels and object features,and dynamical label aggregation.Traditional(static)label aggregation algorithms usually use generative probabilistic graphical models,which exploit sophisticated relationships to generate labels.The relationships are defined by specific probabilistic func-tions,but the corresponding model optimization is somehow complicated and the model is not easy to implement or extend.To alleviate the prob-lem,this thesis proposes label-aware autoencoders(LAA)which utilizes techniques of variational autoencoders(VAE)to construct a neural-network-based framework for label aggregation.LAA contains a classifier and a reconstructor,which are simultaneously optimized in an unsupervised man-ner to achieve label inference.The proposed model is easy to understand,implement,and extend based on the neural network framework.The learned weights in the neural network are interpretable.Through experiments on real-world datasets,the proposed model shows significant improvement of inference accuracy compared with state-of-the-art algorithms.By using both labels and object features,an aggregation algorithm can achieve higher inference accuracy.Current algorithms usually adopt the idea of supervised learning directly and replace ground-truth labels in supervised learning with noisy crowdsourcing labels,to learn classifying object fea-tures.However,without explicitly handling label noise,those algorithms are likely to be trapped in imprecise decision boundaries.This thesis proposes to exploit clustering to alleviate label noise.Assume objects in a fine-grained cluster have similar true labels after clustering,a cluster label representing all labeling information of objects in the same cluster,is exploited to infer true labels,therefore label noise is alleviated compared with using a single object.With the idea of fine-grained clustering,this thesis proposes three models:an instance grouping model(InGroup),a clustering-based label-aware au-toencoder(CLA),and a deep clustering-based aggregation model(DCAM).InGroup constructs relationships between object features and cluster labels,by using traditional techniques of probabilistic graphical models.CLA uses the generative framework which introduces a deep generative process to simultaneously generate object features and labels from clusters.By extend-ing the VAE framework,CLA optimizes model parameters by constructing evidence lower bound and a regularizer and exploits cluster labels to infer true labels of objects.DCAM uses the classifying framework to regularize label generation with deep clustering,which is easy to implement and op-timize.Proposed models integrate the techniques of probabilistic graphical models,neural networks,deep clustering,and VAE,to infer true labels via clustering.Experimental results on real-world datasets show proposed mod-els significantly improve inference accuracy compared with state-of-the-art algorithms.The effect of cluster number is further illustrated and discussed,which supports the idea of fine-grained clustering for label aggregation.Dynamical label aggregation assumes the process of label collection is sequential and an algorithm can choose to collect labels from credible sources,therefore the cost of the overall label collection is reduced.Current algorithms usually measure the utility of each object-source pair,but they are based on specific(simple)aggregation models,or only model the utility for a single object that ignores the effect on all objects by acquiring a label.This thesis proposes an algorithm named expected entropy reduction(EER)for dynamical label aggregation.EER proposes a global utility function by minimizing entropy and constructs an approximation to make the utility fea-sible.Further,EER develops a self-adaptive strategy to avoid local minima.Through experiments,EER achieves higher inference accuracy compared with other algorithms.
Keywords/Search Tags:Crowdsourcing, label aggregation, neural networks, clustering methods, unsupervised learning, machine learning
PDF Full Text Request
Related items