Font Size: a A A

Chinese Semantic Similarity Dataset Construction And Word Embedding Fused Hownet

Posted on:2018-09-01Degree:MasterType:Thesis
Country:ChinaCandidate:Y WanFull Text:PDF
GTID:2428330542970855Subject:Natural language processing
Abstract/Summary:PDF Full Text Request
Semantic similarity,also known as semantic relatedness,is one of the most common and important task in Natural Language Processing.Machine Translation,word sense disambiguation and other tasks that need to deal with semantic information are closely related to semantic similarity calculation.To evaluate a semantic learning algorithm is excellent or not,it is often to calculate the result upon golden-standard dataset.The higher consistency test result shows,the better the algorithm handles the semantic information.Therefore,an objective and fair semantic correlation criterion dataset can be used to evaluate the advantages and disadvantages of a semantic learning algorithm roundly.The first part of this paper is to construct a test set of semantic relevance criterion by means of traditional statistical methods and cognitive neuroscience experiments.From the comparison of some words,and with other existing standard test set can be found,testing the semantic relevance criteria set constructed in this paper mainly includes the semantic similarity and semantic correlation,not related to the semantics of three parts,the whole test set in artificial scoring results consistency is very high,and event-related potential(ERPs)experiment show that for semantic similarity and semantic correlation,not related to the semantics of three words,the human brain reflects the different processes in language cognition process.Finally,the standard test set scores are more evenly distributed,compared to some existing test sets,describe the similarity degree between words is more accurate and consistent evaluation of the effect of word vectors training effect of the existing and existing data.Word vector,also called word embedding,refers to the use of the concept of distributed learning,the semantic description of each word with a semantic vector space,thus all semantic computation can all be converted into corresponding vectors calculating.Generating a good word vector,for the effect of other tasks,have a very important impact on Natural Language Processing.The existing popular word vector training method,through the large-scale corpus,the co-occurrence information based on a word window,conversion error propagation for corresponding parameters and correction,and ultimately achieve a high degree of co-occurrence words corresponding word vector cosine value.The second part of this paper is to improve the word vector traditional training methods combined with the knowledge of word sense information.Through each word corresponding to the meanings of the words and the words co-occurrence information,we train the word vector jointly in order to achieve the purpose of improving the training effect.
Keywords/Search Tags:semantic relatedness, word vector, cognitive neuroscience, eventrelated potentials(ERPs), neural network
PDF Full Text Request
Related items