Font Size: a A A

Research On Semantic Similarity Matching Algorithm Of Questions Based On Deep Learning

Posted on:2022-08-28Degree:MasterType:Thesis
Country:ChinaCandidate:J KouFull Text:PDF
GTID:2518306494971449Subject:Software engineering
Abstract/Summary:PDF Full Text Request
With the continuous development of question answering systems and search engines,how to accurately match user questions and corresponding answers is of great importance.At present,a large number of questioning methods are emerging one after another.Due to the complexity of Chinese word segmentation and semantic acquisition,it is based on Chinese semantics.The equivalent task directly judges the semantics of two questions without a given scenario.There are still problems that the same meaning may be misunderstood.Therefore,the high-precision question similarity analysis algorithm is playing an increasingly important role in the era of big data.There are still many problems in the traditional question similarity matching method.On the one hand,the low-level feature extraction effect is insufficient,and the information extraction method for long and short sentences cannot achieve a balance;on the other hand,there is a problem of information feature loss when the two question sentences are matched with the calculation results of the algorithm.With the deepening of deep learning training,semantic information matching has errors and deficiencies.In response to the above problems,this paper proposes an RFEM(rich feature extraction model)model for rich feature information extraction.The model uses the method of multi-angle information feature extraction to calculate and retain the maximized underlying information feature in the massive data;at the same time,the semantic matching method of the two sentences is improved,and the two sentences are effectively aligned and matched.Whether the content of semantic information is consistent.The main content of this article is divided into the following two aspects:(1)An algorithm fusion method based on multiple models to extract features at the same time is proposed,that is,in the coding layer,CNN and LSTM are used to extract information features respectively.CNN focuses on the extraction of local features,and LSTM focuses on the extraction of sequence features.Through a large number of experiments and the analysis of the visualization results of long and short sentences,CNN is more accurate for short sentence information extraction,and LSTM is better for long sentence semantic feature extraction.At the same time,residual network is incorporated in the loop calculation process,and two coding methods are used to extract the information features of the same sentence,and the information features of the coding layer are retained to the maximum.(2)A matching algorithm based on a variant multi-head cyclic attention mechanism is proposed,that is,in the alignment layer,the information of two sequences is aligned,and the attention weight of the sentence is calculated iteratively through the N-layer attention mechanism.When re-entering the alignment layer,the residual network is merged and the most original feature information is added.The variant multi-head attention mechanism of this model does not divide the initial vector equally and recombine,which retains all the features of the original sentence to a greater extent information.As the depth of the network deepens,the return of gradient information is blocked.After N times of attention,the full-link neural network is used to reduce the dimensionality,which reduces the problem of information loss caused by the difficulty of network training.Finally,a large number of experiments show that the public Chinese BQ data set has certain annotation defects.The similarity is calculated through Baidu's public interface and the data is cleaned through analysis.The cleaned data is recorded as BQ+,and the accuracy calculation results of the overall model are compared.It is concluded that the accurate value results of the RFEM model on the BQ data set and BQ+ data set are significantly better than other models.
Keywords/Search Tags:sentence semantic equivalence identification, feature extraction, sentence matching, variant multi-head attention
PDF Full Text Request
Related items