Research On Chinese Word Semantic Similarity Computation

Posted on:2018-11-27

Degree:Master

Type:Thesis

Country:China

Candidate:J H Pei

Full Text:PDF

GTID:2348330536460962

Subject:Computer application technology

Abstract/Summary:

PDF Full Text Request

Word semantic similarity is a measure of the degree of similarity in terms of the meanings of two words.Word semantic similarity computation is a fundamental and core task,and it can be used to map the abstract relationship of �similarity between words� into real value,therefore,a natural language processing problem can be transformed into a machine learning problem.Its performance will directly affect various tasks in the natural language processing and information retrieval field.In recent years,computing word semantic similarity using embedding-based methods and its improved methods has become the frontier and hot research topic in this field.In this thesis,we study on the �Chinese Word Semantic Similarity Computation�,and mainly focus on how to improve the word similarity computation based on word embeddings and divide the research into two parts:(1)Embedding-based Word Similarity Computation without Semantic ConstraintsWe use machine translation techniques and LSTMs network to improve a standard Skip-gram model respectively: firstly,the standard Skip-gram model is used to obtain a basic word embedding according to different training corpus.The influence of the size and quality of corpus on word embedding model is analyzed experimentally.Secondly,we try to construct the relationship between Chinese and English words by machine translation,to be more specific,we use large-scale English word embedding to alternatively replace the Chinese one to get a better performance.Finally,the problem of word similarity computation is transformed into word relationship prediction problem,and the word relationship is constructed by learning the coherent sentence through LSTMs network.(2)Embedding-based Word Similarity Computation with Semantic ConstraintsIn this paper,we propose an improved Counter-fitting model to incorporate semantic constraints into a pre-train word embedding to compute word semantic similarity: firstly,we use web crawler to expand the context of the words.Specifically,we capture the sentences that a word occurs or word pairs co-occur as the �context�,and we get some synonyms and antonyms to expand the existing manual semantic lexicons.Secondly,we compute the word semantic similarity using semantic lexicons,retrieval results and pre-trained word vectors.Finally,the improved counter-fitting model is used to optimize the pre-trained word vectors.The concrete approach is to construct the polynomial objective function by semantic constraints and topological space reservation,and then use the gradient descent algorithm to solve the objective function.The semantic constraints include not only synonym constraints and antonym constraints,but also the similarity constraints.The experimental results show that the method based on semantic lexicons has the inherent advantages in the case of high coverage of known words.While methods based on word embedding and retrieval are more practical when there are a large number of unknown words.In addition,the counter-fitting method that incorporates semantic constraints into word embedding get the state-of-the-art performance on PKU-500 dataset with a Spearman's rank correlation coefficient of 0.552,which outperforms the performance of semantic lexicon-based model,retrieval-based model and embedding-based model.

Keywords/Search Tags:

PDF Full Text Request

Related items

1	The Research Of HowNet Based Word Similarity Computation And Its Application
2	Chinese Semantic Similarity Dataset Construction And Word Embedding Fused Hownet
3	Research On Lexical Semantic Similarity Measurement Based On Knowledge Integration
4	Trend Analysis Of Network Popular Words By Using Semantic Knowledge
5	Research Of Comprehensive Weighted Word Semantic Similarity Computation
6	Automatic Construction Method For Domain Concepts Based On Wikipedia Semantic Knowledge Base
7	Word Similarity Measurement Based On Word Embedding And WordNet
8	Research On Statistical Word-level Semantic Relatedness Computation
9	Research On Ontology Alignment Based On Word Embedding
10	Research On Semantic Expression Based On Knowledge Source Embedding And Multi-modal Data Fusion