Font Size: a A A

Research On Chinese Word Sense Disambiguation Based On Neural Networks

Posted on:2020-10-08Degree:MasterType:Thesis
Country:ChinaCandidate:X Y ChengFull Text:PDF
GTID:2428330575457601Subject:Engineering
Abstract/Summary:PDF Full Text Request
The ambiguity of word is an intrinsic feature of natural language.Word sense disambiguation(WSD)is one of the basic task of natural language processing(NLP).Improved WSD would be beneficial to variety of downstream tasks and applications,such as information retrieval,machine translation and information extraction.The word embedding trained by large-scale corpus contains rich semantic and syntactic information,and we can add it to the WSD model to improve the precision.With the development of neural networks and the improvement of computer processing capabilities,neural networks have made significant progress in many NLP tasks.However,the research of Chinese WSD based on neural networks is relatively rare,and the external knowledge of target words is neglected in existing work.Therefore,this paper focuses on the Chinese WSD task which used the statistical machine learning method and the neural network method.The specific work is as follows:(1)We proposed a Chinese WSD model based on Support Vector Machine(SVM)combines word embedding.The context word features and part-of-speech features represented by the word embedding are used instead of the complex features in the statistical machine learning method as the input of the SVM classifier.In this paper,we use Chinese word embedding trained by the ngram2 vec model.Character features and ngram features are added to the context features during word embedding training.The model uses macro-average precision evaluating the results and achieved 80.44% in the SemEval 2007 task5 multilingual Chinese-English lexical sample.Compared with the best results of the previous machine learning method,the macro-average precision is increased by 2.56%.The micro-average precision of the model in the Chinese word sense annotation corpus constructed by Zhengzhou University reached 83.18%.(2)We proposed a Chinese WSD model based on linguistic knowledge and neural network.Firstly,we propose a Chinese WSD model based on Bidirectional Long Short Term Memory(Bi-LSTM).The model uses Bi-LSTM modeling the context semantic information of target word,and classifying the word sense by softmax function.Then we build a Chinese neural WSD model incorporating dictionary information.The glosses and example sentences in the dictionary are added to the neural network model as external knowledge for assisting the word sense judgment.The model uses two BiLSTM to model the context information and dictionary information of the target word respectively,and uses the attention algorithm to model the semantic relationship between them.Finally,the context information and dictionary information are combined to perform Chinese word sense disambiguation.The model has a macro average accuracy of 85.28% in the SemEval 2007 corpus.
Keywords/Search Tags:WSD, Dictionary, Bi-LSTM, Word embedding, SVM
PDF Full Text Request
Related items