Research On The Application Of Semi-supervised Learning In Natural Language Processing

Posted on:2015-04-20

Degree:Master

Type:Thesis

Country:China

Candidate:X Zhou

Full Text:PDF

GTID:2298330422990911

Subject:Computer Science and Technology

Abstract/Summary:

PDF Full Text Request

Development of natural language processing techniques has brought a lot ofconvenience to people’s lives. Supervised learning approaches have achievedgreat success in natural language processing, but it is difficult to extend to thetasks which have scarce labeled data because it relies on huge amount of labeledcorpora. There are many natural language processing tasks lack labeled corpus,but unlabeled corpus is very easy to obtain, in that case semi-supervised learningis a good choice. Semi-supervised learning tries to employ a large collection ofunlabeled data and a few labeled examples for improving generalizationperformance, which has been proved meaningful in real-world applications. Thebottleneck of exiting semi-supervised approaches lies in huge computational costdue to the large scale unlabeled data.In this paper we focus on how to apply semi-supervised learning approachbased on active learning and semi-supervised learning approach based on anchorgraph into natural language processing tasks well. First in this article weproposed a semi-supervised support vector machine framework based on activelearning result in linear time and space complexity using averaged stochasticgradient descent (ASGD) to solve the model and focusing on rational activelearning strategy. Our results on text classification and sentiment classificationhave shown than our approach achieves considerable effect as other mainstreamsemi-supervised support vector machines and significantly enhance the trainingspeed. Meanwhile, the learning framework can also be extended to other semi-supervised learning model. Then，we study on how to apply graph-based semi-supervised learning algorithms into part-of-speech tagging. We first apply anchorbased label propagation into part-of-speech tagging, focus on the solution of datasparse problem in natural language processing field and discuss how to use wordembedding feature. The results suggest that anchor based label propagation usingword embedding context feature can improve the accuracy of POS tagging. Thenwe compare and analysis the two style of semi-supervised learning algorithms intheory, basic assumption, time and space complexity and fitting features, thengive suggestions on how to choose methods above. Finally, we apply graph based semi-supervised learning approach into cross-lingual part-of-speech tagging toestimate the tags of unaligned words. The results suggest that our approach givean improvement upon traditional baseline.

Keywords/Search Tags:

active learning, semi-supervised learning, semi-supervised supportvector machines, cross-lingual part-of-speech tagging, label propagation

PDF Full Text Request

Related items

1	Semi-supervised Structured Learning For Pos-tag Projection Across Languages
2	Research And Application Of Image Classification Algorithm Based On Semi-supervised Learning
3	Research On Partially Labeled Problem Based On Active Learning And Semi-supervised Mechanism
4	Research Of Active Semi-supervised Clustering And Its Application In Community Detection
5	Research On Graph-based Semi-Supervised Learning Model And Classifier Design
6	Research On Laodian Participle And Part-of-speech Tagging Method
7	Research Of Reliable Semi-supervised Classification
8	Online Semi-Supervised Learning Theory,Algorithms And Applications
9	Research On Semi-supervised Clustering And Classification Algorithm
10	The Study Of Robust Semi-Supervised Classification Algorithm Based On Label Prediction And Propagation