Font Size: a A A

Semi-supervised Structured Learning For Pos-tag Projection Across Languages

Posted on:2013-08-12Degree:MasterType:Thesis
Country:ChinaCandidate:P L HuFull Text:PDF
GTID:2268330392467974Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
Natural language processing(NLP) has achieved great success in thisinformation age,and people can’t live without natural language processing. Dueto the need of culture exchange, there are requirements for the minoritylanguages, which are lack of labeled corpus. Thus it limits development of theNLP technique in these languages. People try to use cross-lingual projectionmethods which utilize the resource-rich languages to help the learning inresource-poor languages.In this paper, we resort to several semi-supervised structured learningalgorithms, which make use of the word alignment to help pos-tag projection. Wedefine the cross-lingual projection problem as semi-supervised structuredlearning problem. All the proposed methods are incorporated into this framework.Then we propose the direct projection algorithm, project the pos-tags of thesource language directly to the target language via word alignment. Then weconsider the algorithms in the absence of target language labeled data and fewamount of labeled data. At the same time, we study the word alignment filteringmethods. We use two word alignment filter methods, the cross-lingual projectionaccuracy is improved. We also use the co-training framework to solve the cross-lingual projection problem, extend the co-training method to the structurallearning, and research on the confidence metric in the sequence labeling modeland the influence of the different types of alignments. The experiments show thatusing one to one word alignment and the training data update strategy based onthe pieces can get better result. Finally, we use the label propagation algorithm toreduce the noise introduced by the direct projection. The similarity graph is builtusing the context feature of a word. In this process, we use singular valuedecomposition technique for feature reduction, in order to reduce impact bring bythe sparse feature problem. Then we use the distribution of the pos-tags inferredby label propagation to constrain the Markov Random Fields. The experimentshows the co-training and label propagation algorithm succeed in pos-tagprojection task, which is better than the using direct projection and supervisedmethods with small amounts of labeled data.
Keywords/Search Tags:pos-tagging, semi-supervised learning, cross-lingual projection, co-training, label propagation
PDF Full Text Request
Related items