Font Size: a A A

Lexical Semantic Relationship Prediction Based On Word Vector

Posted on:2019-07-12Degree:MasterType:Thesis
Country:ChinaCandidate:Z H PanFull Text:PDF
GTID:2348330569495772Subject:Engineering
Abstract/Summary:PDF Full Text Request
Natural Language Processing is a significant research direction in computer science and technology and artificial intelligence field.There exists a complex relationship between the vocabularies,syntactical structure and the meaning of text in human language.As the researches develops deeply in recent years,lots of researchers has focused on the research of semantics between vocabularies.And the training method based on word vector proposed by Mikolov has indicate a new direction of research.With the unsupervised learning method,the simple subtraction of word vector contains verifiable vocabulary semantics relationship,such as vector subtraction king-man ? queen-woman.He pointed out that such vector subtraction can only answer 40% problem of SSemEval-2012 Task 2.The researches which are based on the residual relationship vector and got achievements are focusing on some simple semantics relationships like tenses,voice and hyponymy.While the complex relationships like whole part relationship and event relationship are waiting to be researched.Together with the above problems,this thesis proposes three kinds of prediction model based on word vector(Word2Vec and GloVe)to digging the complex relationships like whole part relationship and event relationship.What's more,it can also verify the applicability of relationships of tenses and voice.The training data of this thesis are based on the training word vector of Wikipedia,which confirms there has no emphasized relationships in those data.Depending on the order of training collections' relationspecific vector offset's sorting clustering,clustering first and sorting first model are proposed.Clustering first clusters the object relationship vector under the unsupervised learning method,transform the relationship vector with tag character and predict the relationship by sorting algorithm.Sorting first sorts the relationship vector by tags,use clustering algorithm and negative sampling model to learning the common relationship vector and predict the relationship by sorting algorithm finally.There are total 9 kinds of vocabulary relationships verified by the two models and the average accuracy is over 95%.For the part of relationship with transitivity,this thesis obtains 6 kinds of part of relationship induce word relations by promoting spectral clustering.This thesis adopting the segment predict method and negative simpling model to mine the candidate word of part of relationship automatically.If the candidate word is absence,adding network data to supply the candidate word can resolve the problem.Finally,using the prediction model to filter candidate words.The whole process is carried out under the open corpus.The precision rate of using a single model can reach 84%.Under the multi model optimization strategy,the precision rate is raised to 90%.
Keywords/Search Tags:NLP, Relationship Prediction, Word Vector
PDF Full Text Request
Related items