Font Size: a A A

Refining Word Vector Representation With Reliable Lexical Semantic Constraints

Posted on:2020-05-10Degree:MasterType:Thesis
Country:ChinaCandidate:Y S LiangFull Text:PDF
GTID:2518305981952789Subject:Master of Engineering
Abstract/Summary:PDF Full Text Request
Recently,the word vector representations are used into a wide range of downstream natural language processing(NLP)applications,including machine translation,text classification,sentiment analysis etc.The NLP models' properties can be improved further since the models know more about the language via the high-quality word vector representations.Studies have shown that word vectors trained from large corpora can get improvement in the performance by being refined with semantic constraints extracted in various lexical taxonomies.However,the issue about the stability of reliability is general in manually or semi-manually constructed lexical taxonomies.To some extent,this issue will reduce the correctness of semantic constraints extracted in taxonomies,and the unreliable semantic knowledge will bring some negative effects to the work of word vectors refitting.To provide a beneficial supplement in word vectors' amendment,this article proposes an approach of extracting reliable lexical semantic constraints and then uses these reliable semantic constraints into word vector representations' amendment.The main contributions of this thesis are as follows:(1)Based on the evaluation about reliable semantic constraints with taxonomies,a method of refitting word representations is proposed.Considering the potential differences between the heterogeneous taxonomies,this method extracts the semantic constraints from the heterogeneous taxonomies respectively at First,and then it assess the reliability of constraints by the heterogeneous taxonomies' interaction.Furtherly,focused on the semantic constraints' applicability,this dissertation also improves the original amending mechanism and finally,the word vector representations are increased the calculation ability on word similarity successfully.(2)Based on extraction of reliable lexical semantic constraints synthesizing the word vectors and the taxonomies,a method of refitting word representations is proposed.As we all know,because of the low frequency,the unreliable word vectors are produced easily in the word vectors training process.Considering this issue,the dissertation proposes a method of extracting reliable lexical semantic constraints,which is expanded from the approach mentioned in the first contribution.This method bases on lexicon-vectors interaction and the heterogeneous taxonomies' interaction.In this method,the synonyms from lexical taxonomies are assessed for reliability based on word vectors' calculation.And then the unreliable word vectors are deleted.After that,some constraints deleted by mistake can be recovered through the heterogeneous taxonomies' interaction.Moreover,the transmission mechanism of core words can avoid the negative effects from the unreliable word vectors.In conclusion,this method can decrease the negative effects from the erroneous semantic constraints and the unreliable word vectors efficiently in the amendment process.As a result,the performance of the refined word vectors can be further improved.(3)Based on the reliability evaluation aimed at within-class semantic constraints with cluster quality assessment,a method of refitting word representations is proposed.On the foundation of the reliable semantic constraints extracting method mentioned in the second contribution above,this thesis discuss on the weight Assignment among the within-class synonymous semantic constraints in the word vectors' refining process and explore a better index about reliable semantic constraints evaluation.In detail,this thesis found that one of the indexes in cluster quality assessment,the indicator of with-class compactness,has a better performance in reliability evaluation.Moreover,this dissertation found that the different weight which is quantified according to the degree of reliability also can bring benefits to the word vectors amendment.This method fully considers how much influence the degree of semantic constraints' reliability can brings on the word vector representations.Experimental result shows this method can further improve the word vectors' calculation ability on word similarity successfully.The proposed approaches are tested on the PKU 500 which is from the NLPCC-ICCPOL 2016 shared task on Chinese word similarity measurement.Applying the two state of the art in refining vectors,the thesis amend word vectors with reliable semantic constraints extracted by the proposed approaches,and the refined vectors outperform in the word similarity calculation.Experiment show that all the proposed approaches,including the heterogeneous taxonomies' interaction,lexicon-vectors interaction,the transmission mechanism of core words and the index of reliability evaluation,are helpful to improve the performance of word vectors representations.The proposed methods achieve the best Spearman score 0.6570,which gains 26.8% improvement comparing to the best result in the shared task.
Keywords/Search Tags:word vector representation, amendment, reliable constraints, lexical taxonomies, interaction
PDF Full Text Request
Related items