Font Size: a A A

Design And Implementation Of Crowdsourcing Text Annotation System

Posted on:2022-05-24Degree:MasterType:Thesis
Country:ChinaCandidate:W Z LiuFull Text:PDF
GTID:2518306575973989Subject:Electronics and Communications Engineering
Abstract/Summary:PDF Full Text Request
As one of the most important research themes in information technology,natural language processing has a wide range of applications in diverse domains such as machine translation,intelligent question answering,text classification and etc.Natural language processing technology is to obtain structured data from numerous narrative articles that can be understood and learned by machines.As a crucial engineering in supervised learning based natural language processing,how to annotate text efficiently and accurately becomes important for researches.This thesis introduces a crowdsourced text annotation system.The system provides a platform for text annotation,which can be used in various natural language processing tasks and adopt different annotation domains.It can be used to build a high-quality labeled corpus efficiently in crowdsourcing manner.The system adopts a browser/server development architecture,and the front end and back end have been developed respectively.It can be mainly divided into six modules,including annotation task management module,text annotation module,error feedback module,status monitoring module,truth value inference module and crowdsourcing incentive module.The main contribution of this thesis can be summarized as follows:1.In the truth value inference module,the system implements conformance detection and multi-person annotation results combination by three truth value inference methods: a)majority voting based inference,b)supervision data based inference,c)annotating efficiency based inference.The annotation truth value inferred by the system can be displayed for the task publisher to review,which can also be used to improve the accuracy and reliability of the labeled corpus;2.In the crowdsourcing incentive module,the system implements a crowdsourcing incentive mechanism,which creates an alterable score for different actions,such as task release,corpus download,data annotation and etc.The score mechanism can improve users' enthusiasm and investment in annotation,as well as the overall text annotation quality.Finally,the system has been deployed on the server and can be accessed through major browsers.We have performed the functional tests,the system performance tests and the browser compatibility tests in system.Experiments show that the crowdsourced text annotation system can meet the text annotation requirements such as sequence annotation and text classification.The methods of annotation truth value inference and consistency detection are believable.This system proposes innovations on the basis of meeting engineering needs,but there is still room for improvement.Further improvements include the iterative labeling methods and increase entity triple labeling functions to improve labeling efficiency and expand system application.
Keywords/Search Tags:Crowdsourcing, text labeling, labeling quality control, web system
PDF Full Text Request
Related items