Chinese Semantic Similarity Dataset Construction And Word Embedding Fused Hownet

Posted on:2018-09-01

Degree:Master

Type:Thesis

Country:China

Candidate:Y Wan

Full Text:PDF

GTID:2428330542970855

Subject:Natural language processing

Abstract/Summary:

PDF Full Text Request

Semantic similarity,also known as semantic relatedness,is one of the most common and important task in Natural Language Processing.Machine Translation,word sense disambiguation and other tasks that need to deal with semantic information are closely related to semantic similarity calculation.To evaluate a semantic learning algorithm is excellent or not,it is often to calculate the result upon golden-standard dataset.The higher consistency test result shows,the better the algorithm handles the semantic information.Therefore,an objective and fair semantic correlation criterion dataset can be used to evaluate the advantages and disadvantages of a semantic learning algorithm roundly.The first part of this paper is to construct a test set of semantic relevance criterion by means of traditional statistical methods and cognitive neuroscience experiments.From the comparison of some words,and with other existing standard test set can be found,testing the semantic relevance criteria set constructed in this paper mainly includes the semantic similarity and semantic correlation,not related to the semantics of three parts,the whole test set in artificial scoring results consistency is very high,and event-related potential(ERPs)experiment show that for semantic similarity and semantic correlation,not related to the semantics of three words,the human brain reflects the different processes in language cognition process.Finally,the standard test set scores are more evenly distributed,compared to some existing test sets,describe the similarity degree between words is more accurate and consistent evaluation of the effect of word vectors training effect of the existing and existing data.Word vector,also called word embedding,refers to the use of the concept of distributed learning,the semantic description of each word with a semantic vector space,thus all semantic computation can all be converted into corresponding vectors calculating.Generating a good word vector,for the effect of other tasks,have a very important impact on Natural Language Processing.The existing popular word vector training method,through the large-scale corpus,the co-occurrence information based on a word window,conversion error propagation for corresponding parameters and correction,and ultimately achieve a high degree of co-occurrence words corresponding word vector cosine value.The second part of this paper is to improve the word vector traditional training methods combined with the knowledge of word sense information.Through each word corresponding to the meanings of the words and the words co-occurrence information,we train the word vector jointly in order to achieve the purpose of improving the training effect.

Keywords/Search Tags:

PDF Full Text Request

Related items

1	Research On Sentiment Analysis For Web Texts Based On Semantic Relatedness
2	Research On Statistical Word-level Semantic Relatedness Computation
3	Research On Concept And Short Text Semantic Relatedness Calculation Method
4	Two projects in theoretical neuroscience: A convolution-based metric for neural membrane potentials and a combinatorial connectionist semantic network method
5	The Research Of Automatic Single Text Summarization Based On Latent Semantic Analysis
6	Semantic Similarity Measurement Of Short Text By Convolutional Neural Network Based On Multi-Dimensional Attention On Word Vector
7	Word Sense Disambiguation Based On Semantic Relatedness Computation
8	Study Of Cognitive Neuroscience Mechanism Of Heuristic Problems Solving And Methods Of FMRI Data Analysis
9	Research Of Semantic Relatedness Measure Based On Wikipedia Structure
10	Multiple Documents Automatically Summary Based On Semantic Word Vector