Research On Word Sense Disambiguation Based On K-means Cluster And LSTM

Posted on:2021-02-17

Degree:Master

Type:Thesis

Country:China

Candidate:X S Zhou

Full Text:PDF

GTID:2428330605972932

Subject:Computer Science and Technology

Abstract/Summary:

PDF Full Text Request

Chinese contains many ambiguity words,which can express disparate meanings with different contexts.The concept of word sense disambiguation(WSD)is proposed for applying computer in natural language processing(NLP).We expect that with the help of algorithm,computers can clear the meaning of context and select the accurate implication of ambiguous words automatically.WSD makes computers comprehend and apply natural language accurately.It has been widely utilized in machine translation,text classification and so on.WSD has been an important issue to be solved urgently in NLP.This paper presents a WSD method based on K-means clustering method and LSTM(Long Term Memory,LSTM).Unlabeled corpus is merged by a semi-supervised K-means cluster.Then,they are added into training corpus to optimize LSTM model and its performance is tested by testing corpus.Three research aspects are reflected in this paper as follow:Firstly,the current research status and development at home and abroad is introduced by analyzing the literature on WSD.The objective and significance of WSD are clarified,and we summarize difficulties and development trend of WSD in the future.Secondly,synonyms word forest and the necessary corpus for experiment are introduced.By studying related knowledge of WSD feature engineering,we confirm the extraction process of clustering features and disambiguation features.The disambiguation process of Bayesian classifier and LSTM classifier is described in detail.Finally,we introduce the process that semi-supervised K-means cluster merges unlabeled corpus.Several cluster centers are selected in labeled corpus.Then take an unlabeled data and calculate its distance to each clustering center.If there is a distance to a certain clustering center less than threshold value,theunlabeled data is taken out and putted into the class which the clustering center is located in.Update clustering centers in labeled corpus after calculating distance from each unlabeled data to each cluster center.Repeat this process until the clustering centers in labeled corpus aren't updated any more.Adding the clustering data into training corpus,we get the extended training corpus.It is used to train LSTM model.After getting the optimized LSTM classifier,we test it with testing corpus.Experimental results show that the disambiguation ability of the proposed method in this paper is higher than LSTM classifier,DBN classifier and Bayesian classifier.

Keywords/Search Tags:

word sense disambiguation, K-means cluster, LSTM classifier, disambiguation features

PDF Full Text Request

Related items

1	Research On Word Sense Disambiguation Based On DBN
2	Research Of Word Sense Disambiguation Based On Hybird Features And Rules
3	An Unsupervised Approach To Word Sense Disambiguation Based On Second-order Context
4	Research On Statistical Method Of Chinese Word Meaning Disambiguation Based On Multi - Classifier
5	Chinese Word Sense Disambiguation Based On Transformer And LSTM Model
6	A Study Of Chinese Word Sense Disambiguation Based On Hownet
7	Based On Semi-supervised Method Of Chinese Word Sense Disambiguation
8	Chinese Word Sense Disambiguation Based On Parsing Tree
9	Research On Word Sense Disambiguation Based On The Strategy Of Field Priority Selection
10	Research On Chinese Word Sense Disambiguation Method Based On Deep Learning