Research On Chinese Word Sense Disambiguation Based On Neural Networks

Posted on:2020-10-08

Degree:Master

Type:Thesis

Country:China

Candidate:X Y Cheng

Full Text:PDF

GTID:2428330575457601

Subject:Engineering

Abstract/Summary:

PDF Full Text Request

The ambiguity of word is an intrinsic feature of natural language.Word sense disambiguation(WSD)is one of the basic task of natural language processing(NLP).Improved WSD would be beneficial to variety of downstream tasks and applications,such as information retrieval,machine translation and information extraction.The word embedding trained by large-scale corpus contains rich semantic and syntactic information,and we can add it to the WSD model to improve the precision.With the development of neural networks and the improvement of computer processing capabilities,neural networks have made significant progress in many NLP tasks.However,the research of Chinese WSD based on neural networks is relatively rare,and the external knowledge of target words is neglected in existing work.Therefore,this paper focuses on the Chinese WSD task which used the statistical machine learning method and the neural network method.The specific work is as follows:(1)We proposed a Chinese WSD model based on Support Vector Machine(SVM)combines word embedding.The context word features and part-of-speech features represented by the word embedding are used instead of the complex features in the statistical machine learning method as the input of the SVM classifier.In this paper,we use Chinese word embedding trained by the ngram2 vec model.Character features and ngram features are added to the context features during word embedding training.The model uses macro-average precision evaluating the results and achieved 80.44% in the SemEval 2007 task5 multilingual Chinese-English lexical sample.Compared with the best results of the previous machine learning method,the macro-average precision is increased by 2.56%.The micro-average precision of the model in the Chinese word sense annotation corpus constructed by Zhengzhou University reached 83.18%.(2)We proposed a Chinese WSD model based on linguistic knowledge and neural network.Firstly,we propose a Chinese WSD model based on Bidirectional Long Short Term Memory(Bi-LSTM).The model uses Bi-LSTM modeling the context semantic information of target word,and classifying the word sense by softmax function.Then we build a Chinese neural WSD model incorporating dictionary information.The glosses and example sentences in the dictionary are added to the neural network model as external knowledge for assisting the word sense judgment.The model uses two BiLSTM to model the context information and dictionary information of the target word respectively,and uses the attention algorithm to model the semantic relationship between them.Finally,the context information and dictionary information are combined to perform Chinese word sense disambiguation.The model has a macro average accuracy of 85.28% in the SemEval 2007 corpus.

Keywords/Search Tags:

WSD, Dictionary, Bi-LSTM, Word embedding, SVM

PDF Full Text Request

Related items

1	Research On Chinese Zero Pronoun Resolution Based On Word Embedding And LSTM
2	Applicaton Research Of LSTM Network Based On Word Embedding In Music Recommendation
3	Bi-LSTM Commodity Recommendation System Based On Word Embedding
4	Research On Multi-granularity Chinese Word Embedding Based On Glyph Structure
5	Research On Sentiment Embedding Model Based On The Value Of Sentimental Word Intensity
6	Web Access Behavior Analysis And Study Based On Word Embedding Technology
7	Research On Text Classification Method Based On Bidirectional LSTM
8	Based On Dictionary And Word Frequency Analysis Of The Unknown Words From The Bbs Of Corpus Recognition Research
9	Research On Chinese Word Segmentation Based On Deep Learning
10	Research On Chinese Speech Transcription Punctuation Prediction Based On Deep Learning