Based On Semi-supervised Method Of Chinese Word Sense Disambiguation

Posted on:2019-09-05

Degree:Master

Type:Thesis

Country:China

Candidate:Z F Xu

Full Text:PDF

GTID:2428330542472987

Subject:Software engineering

Abstract/Summary:

Word sense disambiguation(WSD)is one of important research issues in field of natural language processing.The purpose of word sense disambiguation is to determine the meaning of ambiguous words in daily communication and conversation.Semantic information and part of speech information are two kinds of important linguistic knowledge,which help to determine semantic categories of ambiguous words.With the rapid development of the field of natural language processing,the resolution of word sense disambiguation has become a difficult problem in the field of natural language pocessing.In this paper,a semi-supervised word sense disambiguation method is proposed.Ambiguous word is viewed as center,a word sense disambiguation model is constructed by extracting the disambiguation features from the left and right adjacent lexical units,and the semi supervised method is used to optimize the word sense disambiguation model,which improves the performance of word sense disambiguation classifier.There are three aspects of research work in this paper.Firstly,the research background and significance of word sense disambiguation are expounded,and the research status of word sense disambiguation technology at home and abroad is introduced.The current research status is analyzed,and the difficult problems of word sense disambiguation technology are analyzed and explained.Secondly,this paper introduce dictionary resources and corpus used in the experiment,and expounds the content organization structure of the Tong Yi Ci Ci Lin.The background and content of training corpus and test corpus are explained,and the preprocessing is introduced.The extraction process of disambiguation feature is explained in detail.Thirdly,words,parts of speech and translations are used as discriminative features,which are extracted from left and right word units around an ambiguous word.Word sense classifier based on bayesian model is constructed.Words and parts of speech are used as discriminative features,which are gotten from left and right word units around an ambiguous word.Word sense classifier based on maximum entropy is constructed.Co-training algorithm is used based on a large number of unannotated data to optimize WSD model.Training data in Sem Eval-2007: Task#5 and a large number of unannotated corpus from Harbin Institute of Technology is applied to optimize bayesian classifier and maximum entropy classifier.At the same time,the optimized WSD model is tested.Experimental results show that accuracy of WSD model is improved after the semi-supervised method proposed in this paper is applied.

Keywords/Search Tags:

word sense disambiguation, natural language processing, semantic category, discriminative feature

Related items

1	Research Of Word Sense Disambiguation Based On Indicators With Semantic Category Extending
2	Research And Application Of Word Sense Disambiguation Method Based On Contextual Semantic
3	Research Of Word Sense Disambiguation Based On Word-sense Category Extending
4	Research On Query Expansion & Key Technologies Based On Semantic Analysis
5	Research On Chinese Word Sense Disambiguation Based On Semantic Analysis
6	Chinese Word Sense Disambiguation Based On Semantic
7	Word Sense Disambiguation Corpus Automatic Acquisition
8	Chinese Word Sense Disambiguation Based On Parsing Tree
9	Research On Word Sense Disambiguation Based On The Strategy Of Field Priority Selection
10	Word Sense Disambiguation Based On Semantic And Lexical Information