Font Size: a A A

Study Of Crowdsourced Learning Algorithm For Crowdsourced Labeling

Posted on:2016-08-07Degree:MasterType:Thesis
Country:ChinaCandidate:H SunFull Text:PDF
GTID:2308330470467734Subject:Computer technology
Abstract/Summary:PDF Full Text Request
With the rapid development of computer and Internet technology, the society has entered the era of big data. In order to improve the capacity of the machine learning and data mining algorithms, huge amounts of data are needed to be labeled fast with high quality labels,since they are necessary to the training process of machine learning and data mining algorithms and models. Crowdsourced labeling, a cheap and fast method of using crowdsourcing technology to label data, has got lots of attention and been widely applied in various ways. However, the labels obtained via crowd labeling are noisy and erroneous, since they are collected from a lot of workers with different professional background and ability. So they can’t used as the true labels. In order to filter the labels collected from crowdsourced labeling and generate the true labels, it’s necessary to study the crowdsourced learning algorithms for crowdsourced labeling.In this paper, the study of the crowdsourced learning algorithms for crowdsourced labeling is based on the labeling behavior of the workers and the characteristic of the task and the main work of this paper in details are as follows:(1) The balance of the crowdsourced labels based method to estimate the difficulty of the task is proposed. The balance level of the different kind of labels collected for a task is defined based on the statistic analysis of the size of different kind of crowdsourced labels. Moreover, the difficulty of the task can be estimated based on the balance level.(2) The ability of workers and the difficulty of the task based crowdsourced learning algorithm for crowdsourced labeling is proposed. First, the difficulty of task is estimated by the proposed method. Then, the accuracy and agreement level of workers are estimated. Next, the ability of workers are estimated according to the task’s difficulty and workers accuracy and agreement level. Finally, the final labels are generated according to the workers ability, In this way, the quality of crowdsourced labels is improved.(3) The task feature based semi-supervised crowdsourced learning algorithm for crowdsourced labeling is proposed. First, cluster the features of tasks labeled by crowdsourced workers. Then, on each cluster, learn the ability of workers on different kind of features of tasks from the small part of tasks with true labels. Finally, the final labels of other tasks can be generated, In this way, the quality of the crowdsourced labeling on the tasks with particular characteristics can be improved since the features of tasks are introduced in the algorithm.
Keywords/Search Tags:Crowdsourcing, Crowdsourced Labeling, Crowdsourced Learning
PDF Full Text Request
Related items