Font Size: a A A

Research On Word Sense Disambiguation Based On Semi-supervised Model

Posted on:2022-08-04Degree:MasterType:Thesis
Country:ChinaCandidate:L B TangFull Text:PDF
GTID:2518306317489564Subject:Computer technology
Abstract/Summary:PDF Full Text Request
In field of natural language,meanings expressed by words in different contexts are often biased or even completely different.This is phenomenon of polysemy.Word sense disambiguation(WSD)is proposed to solve polysemous phenomenon of a word.Its purpose is to establish algorithm and model to determine true meanings of ambiguous words in different contexts.This paper uses traditional machine learning models and Convolution Neural Network(CNN)models as basic WSD models.At the same time,semi-supervised WSD model based on multi-model integration and semi-supervised WSD model based on cluster are constructed.The training data is used to optimize basic WSD models.Multi-model integration algorithm and clustering algorithm are used to select and label the unlabeled corpus with high confidence.Annotated corpus with high-confidence is added into the labeled corpus continuously.The purpose is to expand the scale of training corpus and optimize basic WSD model continuously.Repeat this process continuously and maximize the size of the training corpus.At this time,the performance of basic WSD model is optimal.In this article,the following three aspects are mainly studied in detail.Firstly,it analyzes research purpose,research significance and application scenarios of WSD in detail,and introduces research history and current situation of WSD at home and abroad.At the same time,it analyzes the research difficulties of WSD and future research directions on this basis.At the same time,common WSD methods are introduced.Advantages and disadvantages of each method and its applicable scenarios are analyzed in detail.Secondly,it introduces corpus data set used in experiment in this article in detail,and introduces the specific process of disambiguation feature extraction.It introduces the disambiguation process based on Support Vector Machine(SVM)model,Random Forest(RF)model and CNN.These three models are analyzed in detail.Thirdly,the concept of semi-supervised learning method is introduced in detail.Semi-supervised WSD models based on multi-model integration and clustering algorithm are established.The training data set is used to train basic WSD model.Multi-model integration algorithm and clustering algorithm are adopted to determine whether unlabeled corpus is highly confident one.If it is,the data is added into the training data set.The training data set is used to train basic WSD model again.The above process is repeated continuously until no unlabeled data is added into the training data.
Keywords/Search Tags:word sense disambiguation, convolutional neural network, machine learning, semi-supervised, clustering
PDF Full Text Request
Related items