Font Size: a A A

Research On The Application Of Semi-supervised Learning In Natural Language Processing

Posted on:2015-04-20Degree:MasterType:Thesis
Country:ChinaCandidate:X ZhouFull Text:PDF
GTID:2298330422990911Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
Development of natural language processing techniques has brought a lot ofconvenience to people’s lives. Supervised learning approaches have achievedgreat success in natural language processing, but it is difficult to extend to thetasks which have scarce labeled data because it relies on huge amount of labeledcorpora. There are many natural language processing tasks lack labeled corpus,but unlabeled corpus is very easy to obtain, in that case semi-supervised learningis a good choice. Semi-supervised learning tries to employ a large collection ofunlabeled data and a few labeled examples for improving generalizationperformance, which has been proved meaningful in real-world applications. Thebottleneck of exiting semi-supervised approaches lies in huge computational costdue to the large scale unlabeled data.In this paper we focus on how to apply semi-supervised learning approachbased on active learning and semi-supervised learning approach based on anchorgraph into natural language processing tasks well. First in this article weproposed a semi-supervised support vector machine framework based on activelearning result in linear time and space complexity using averaged stochasticgradient descent (ASGD) to solve the model and focusing on rational activelearning strategy. Our results on text classification and sentiment classificationhave shown than our approach achieves considerable effect as other mainstreamsemi-supervised support vector machines and significantly enhance the trainingspeed. Meanwhile, the learning framework can also be extended to other semi-supervised learning model. Then,we study on how to apply graph-based semi-supervised learning algorithms into part-of-speech tagging. We first apply anchorbased label propagation into part-of-speech tagging, focus on the solution of datasparse problem in natural language processing field and discuss how to use wordembedding feature. The results suggest that anchor based label propagation usingword embedding context feature can improve the accuracy of POS tagging. Thenwe compare and analysis the two style of semi-supervised learning algorithms intheory, basic assumption, time and space complexity and fitting features, thengive suggestions on how to choose methods above. Finally, we apply graph based semi-supervised learning approach into cross-lingual part-of-speech tagging toestimate the tags of unaligned words. The results suggest that our approach givean improvement upon traditional baseline.
Keywords/Search Tags:active learning, semi-supervised learning, semi-supervised supportvector machines, cross-lingual part-of-speech tagging, label propagation
PDF Full Text Request
Related items