Font Size: a A A

Research On The Construction Of Uygur, Kazak And Kirgiz Public Opinion Tagging Corpus Based On Crowdsourcing

Posted on:2016-09-20Degree:MasterType:Thesis
Country:ChinaCandidate:H ChenFull Text:PDF
GTID:2308330476950398Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
Large scale annotation corpus plays an important role in both semantic analysis and algorithm research in NLP. The annotation work only involves a simple human intelligence. To fulfill the lack of Uygur, Kazak, Kirgiz language public opinion tagging corpus, the paper proposed a Uygur, Kazak, Kirgiz language public opinion corpus annotation method based on crowdsourcing. Before the annotation work, the research formulated the Uygur, Kazak, Kirgiz language emotion and event tagging specification based on the characteristics of these three languages and the combination of annotation specifications of other resource-rich languages. This research established a three-layer architecture corpus tagging platform based on the tagging specification to guarantee the scalability of the system. Let multi-user access to the annotation work and the research put forward error correction mechanism and quality control strategies. This paper puts crowdsourcing into data collection, data annotation and data estimation.This paper analyzed the crowdsourcing results. The statistical results show that, although the amount of each emotional expression is not average distribution, the emotional strength of each emotion is in average. To ensure the quality of our work, this research proposed a combined crowdsourcing assess method for the crowdsourcing users and data estimate. The result shows that the proposed method can fulfill the expectation of the research. This paper uses the Kappa value to estimate the consistency of each user in crowdsourcing. The Kappa value shows that the users remain consistent in terms of annotation work. The establish of Uygur, Kazak, Kirgiz language public opinion tagging corpus can provide powerful resources for the national minority public opinion researches.
Keywords/Search Tags:Crowdsourcing, Public Opinion, Corpus Tagging, Quality Control
PDF Full Text Request
Related items