Font Size: a A A

Research On Algorithms Of General And Filtered Answer Aggregation Based On Crowdsourcing

Posted on:2020-04-08Degree:MasterType:Thesis
Country:ChinaCandidate:R ZhangFull Text:PDF
GTID:2428330572483897Subject:Software engineering
Abstract/Summary:PDF Full Text Request
With the rapid development of the internet,as a new problem solving model crowdsourcing came into being.As a new business model,crowdsourcing,usually based on crowdsourcing platforms,recruits unknown groups of workers on the internet and uses the wisdom of the group to solve problems that are difficult for computers to solve.Crowdsourcing has received widespread attention from researchers in recent years,when crowdsourcing is used to solve problems,answers are collected from a large number of unknown workers.Workers in the internet are characterized by accessibility and heterogeneity.Accessibility means that the number of network workers is large,the distribution is wide,and the knowledge level is wide.Heterogeneity means that each worker is different,with different qualifications,experience and personality traits,and cannot guarantee that everyone's answers are accurate.The accessibility of the worker makes it possible to obtain the answer to the problem at a lower cost.But the heterogeneity of the workers leads to the uncontrollable accuracy of the answers,which does not guarantee an accurate answer.For example,when a worker who does not understand art evaluates the value of an artwork,it is easy to give a wrong answer.In order to obtain high-quality crowd-sourced answers,research scholars have proposed solutions from the operation of crowdsourcing platfoms such as task design and task assignment.The answer aggregation has become a research hotspot as an important link.As an important part of the crowdsourcing platform,the main purpose of answer aggregation is to obtain high-quality final results through the aggregation and integration of the answers of a large number of workers.In the integration of answers,the reliability of the worker and the difficulty level of the task are usually considered.For reliability,some researchers are based on the accuracy of the worker's historical answer,some researchers are based on a certain field of the worker,others are based on worker's answering trajectory,such as the mouse's trajectory or dwell time,and rarely involve the impact of the worker's own characteristics on reliability,such as worker behavior tendency or personality traits,etc.In this paper,the worker's own characteristics are considered to have a certain impact on the accuracy of the answer.Considering this,two models of worker reliability are proposed in this paper,which are the reliability model based on worker's behavioral tendency and motivation and the reliability model based on personality characteristics.At the same time,considering the cost of problem collection,this paper proposes two answer aggregation algorithms,general answer aggregation algorithm and filtered answer aggregation algorithm.The general answer aggregation algorithm needs to get the answers of as many workers as possible,and the aggregated answer is based on the answers of all the worker.The filtered answer aggregation algorithm is based on reliability,identifies excellent workers,and selects the worker's answer,effectively lowering the cost of problem collection.The main work of this paper is as follows:Firstly,this paper proposes a general answer aggregation algorithm that considers workers' behavioral tendencies and task difficulty.The algorithm first classifies workers according to the characteristics of workers,and analyzes worker behavior according to the classification,and proposes a probability model of workers' answers.The model is based on the gap between the skill level of workers and the difficulty level of tasks,using sigmoid The function indicates the probability that the worker correctly answers the question,and defines the probability distribution of the responses of different types of workers according to different motivations and different personality characteristics of the worker.The improved EM algorithm is then used to estimate the aggregated answers for the tasks,as well as the type of each worker,the skill level of the worker and the difficulty level of the task.This method requires as many workers as possible to answer the questions in order to ensure the accuracy of the answers.The more workers are needed,the higher the task cost.In order to reduce the task cost,this paper also proposes a filtering aggregation algorithm.Secondly,this paper proposes a filtering answer aggregation algorithm based on the overall reliability of workers.The algorithm is based on a probability model.The overall reliability of the results of the workers' responses is divided into two parts,focusing on skills and self-reliability in a certain field.For a certain kind of domain skills,mainly through the sparse coding of task characteristics,its reliability is represented by the five personality characteristics of workers.Five of the personality characteristics are measured by the BFI test which is the international standard.The algorithm can iteratively calculate the worker's skills and reliability,and ultimately achieve overall reliability.Then sort the workers according to the overall reliability and select high quality workers.The algorithm not only analyzes the reliability of workers by considering the worker's own factors,but also models the reliability of workers.It also effectively reduces the task cost by selecting excellent workers.In this paper,the two algorithms are tested on the real data set and the simulated data set respectively,which verifies the correctness and usefulness of the proposed algorithm.At the same time,the algorithm is applied on the crowdsourcing experimental platform,which provides guidance for the application of the algorithm.
Keywords/Search Tags:Crowdsourcing, Worker's characteristics, quality control, EM-based approach, Answer aggregation
PDF Full Text Request
Related items