Font Size: a A A

Research And Improvement Of Question Matching Method In Question Answering System Based On Word Vector

Posted on:2021-04-06Degree:MasterType:Thesis
Country:ChinaCandidate:C C YuFull Text:PDF
GTID:2428330611499043Subject:Applied statistics
Abstract/Summary:PDF Full Text Request
With the development of the Internet industry and the advancement of science and technology,information knowledge in all walks of life has exploded.Among them,the question answering system,as a representative product of artificial intelligence,has accumulated an extremely large question database.How to extract short and accurate information from the massive problem database has gradually become a huge challenge faced by many researchers.The similarity problem detection(question sentence matching)in the question answering system is an effective method to solve this problem.How to accurately represent the semantic information expressed by the question is a crucial step in question matching.At present,the most commonly used text representation model is the vector space model(referred to as SVM),but due to the high dimensionality of the model and the inaccurate representation of text semantic information,scholars have proposed the idea of using word vectors to construct question vectors.Word2 Vec and Glove models are two commonly used word vector training models.Based on these two kinds word vector model,this thesis analyzes the common question vector construction methods and finds that they have certain defects in the representation of question sentences.Therefore,this thesis proposes a part of speech weighted question vector based on classification and keyword extraction Construction method(QWP?CKE),which introduces a classification algorithm and keyword extraction algorithm,which combines the part-of-speech information of the text word and the V?TF-IDF weighting method,so that the method can make full use of the influence information of the characteristic word on the question,Thereby improving the accuracy of question vector representation.In question matching of the question answering system,it is usually necessary to calculate the question similarity and select the question with the highest similarity as the result of question matching.In this thesis,after in-depth analysis and comparison of traditional question similarity calculation methods,the advantages and disadvantages are merged,the word vector and cosine similarity are integrated into the BM25 similarity calculation method,and an improved BM25 text similarity is proposed.Calculation method(BM?CS),this method not only reasonably uses the statistical information of the text data in the question,but also fully takes into account the semantic spatial distance information of the text,so that the improved BM25 algorithm is used to calculate the similarity of the question.Matching effect.The comparative analysis of the experimental results in this thesis mainly verifies three important conclusions:(1)The word vector training model based on Glove has a better matching effect in question matching than the model training method based on Word2 Vec.(2)The QWP?CKE question vector construction method proposed in this thesis is better than other commonly used question vector construction methods in matching questions.(3)Compared with other question similarity calculation methods,the BM?CS algorithm proposed in this thesis has better question matching effect.
Keywords/Search Tags:question answering system, question matching, word vector, similarity calculation
PDF Full Text Request
Related items