Research On Chinese Polysemy Disambiguation Method Based On VCK-vector Model

Posted on:2020-09-26

Degree:Master

Type:Thesis

Country:China

Candidate:Z Zhou

Full Text:PDF

GTID:2428330599955707

Subject:Industrial Engineering

Abstract/Summary:

Since the advent of computer research and development,humans have initially used binary code to interact with the computer,to use assembly language to send instructions to the computer,and then to use the high-level programming language to invoke the various functions of the computer.It has gone through a long period of development.But even using high-level programming languages to interact with computers,it still fails to achieve the ultimate goal of human unremitting pursuit-"Let computers understand the natural language of human beings".Therefore,how to let the computer understand the human natural language come into being as the times require,that is,the natural language processing(NLP).The biggest problem in NLP is that natural languages,unlike high-level programming languages or assembly languages,end up turning each code or instruction into a unique binary code that the computer can understand.This is due to the polysemy and ambiguity in natural language,whether in articles,sentences or words.Therefore,how to eliminate ambiguity in natural language has become a key and difficult problem in natural language processing.Ambiguity in natural language processing influences and restricts the application effect and quality of natural language in various fields,such as machine translation,text processing,Application in the field of information retrieval,data analysis and so on.Based on the above background,a polysemy combinatorial disambiguation model(VCK-Vector)was constructed based on word vector by combining part of speech tagging model based on Viterbi algorithm,CBOW language model and K-Means clustering algorithm in this thesis.The model was analyzed by means of part-of-speech distribution comparison,semantic relevance task and clustering effect analysis.Finally,the model was evaluated by means of polysemy in Modern Chinese Polysemy Dictionary.Compared with the word vector trained by the language model,the VCK-vector model narrows the meaning range of polysemy in context and improved the effect of disambiguation of polysemy through context.More clearly and accurately reflects the relationship between polysemy and its related words.Through verification,the disambiguation accuracy of the VCK-vector model was up to 81.7%.Compared with the statistical disambiguation method based on synonym lexicon,the accuracy of statistical disambiguation method was improved 1.7%,and 29.7% higher than that of using only CBOW language model for polysemy disambiguation.At the end,the experiment verification to the model is carries on in this thesis,through the experiment data arrangement,the experiment result analysis and the Baidu AI word vector contrast and so on work,It is proved that the combined disambiguation model proposed in this thesis is effective and feasible in dealing with the problem of polysemy disambiguation in large Chinese corpus.

Keywords/Search Tags:

natural language processing, polysemy disambiguation, VSK-vector, CBOW language model, K-Means clustering

Related items

1	An Approach To The Polysemy Disambiguation In Natural Language Understanding And Its Application In The Intelligent Instrument Design
2	Research On Name Disambiguation In The Field Of Journalism
3	Natural Language Processing-A Study Of Vectorization Of Chinese Words And Short Texts
4	Polysemy Of Adjectives In Natural Language Understanding And Its Application In The Field Of Product Designing
5	Word Sense Disambiguation Corpus Automatic Acquisition
6	Design And Implementation Of Probabilistic Disambiguation Model Based On BCG
7	Emotion Analysis In Natural Language Processing Based On Eye Tracker
8	Research On E-Commerce Commodity Title Category Classification Algorithm Based On Natural Language Processing Technology
9	Research On Word-level Ambiguity Resolution Method
10	Natural Language Processing Aiming To The Core Texts Of Scientific Literature