Font Size: a A A

Research And Application Of Text Similarity Calculation Method Based On Structured Representation Learning

Posted on:2024-07-28Degree:MasterType:Thesis
Country:ChinaCandidate:X M ChongFull Text:PDF
GTID:2568307157974969Subject:Software engineering
Abstract/Summary:PDF Full Text Request
With the rapid development of mobile internet,text information on the web shows an explosive growth trend,and how to mining the useful information from massive text while filtering out duplicate content has become an urgent problem to be solved.Solving this problem usually involves text similarity calculation,and text semantic representation is an important factor affecting similarity calculation.Text structured representation is a significant approach for text semantic representation.The structured representation generated by this approach is capable of effectively illustrating the dependency relationships among text semantic blocks,precisely conveying the semantic content,semantic center,and theme of a text.Furthermore,research indicates that the semantic representation structure varies depending on the task,and constructing a structured semantic representation tailored to a specific task is crucial for successful processing.However,external parser typically create a generic semantic representation structure,and their use can transform the model architecture into a pipeline style,with errors propagating to later stages of processing,hindering the ability to achieve global optimization and ultimately compromising the model’s performance.As a result,this thesis focuses on investigating structured representations for text and their corresponding similarity calculations.The main work of this thesis is as follows:(1)This thesis proposes a text similarity calculation method based on the Gumbel-Tree-LSTM model.The method involves using a BERT pre-training model to obtain word embedding,followed by utilizing Gumbel-Tree-LSTM to generate a structure tree and obtain structured embedding of the text.The embedding is then sent to the MLP for similarity calculation.In contrast to using an external parser to construct a general parse tree,the proposed method generates a structure tree specific to the task of text similarity calculation.Experimental results demonstrate that this method outperforms classical similarity calculation methods.(2)To address the issue of excessive levels of structural trees and the challenge of constructing complex structures in lengthy texts,this thesis proposes a text similarity calculation method based on a cascade model.Specifically,the method utilizes a stacked Gumbel-Tree-LSTM model to generate a structure tree,where the low-level part analyses dependencies between words in a clause,and the high-level part analyses dependencies between clauses.Experimental results indicate that this method achieves higher accuracy and1value than the former method based on a single layer model.(3)To tackle the challenge of learning the dependency relationships and structures between clauses in the high-level part of the cascade model,and to prevent the"shortcut"in model training,this thesis presents a text similarity calculation method based on auxiliary task learning.The method utilizes auxiliary task constraints parameter learning and facilitate the high-level part of the model acquiring semantic dependencies and structures.Experimental results indicate that this method can further enhance the performance of text similarity computing tasks.(4)Finally,the thesis also applies the text similarity calculation method to design and implement a Xi’an tourism knowledge Q&A system.The Q&A system facilitates accurate matching of relevant content from a semantic perspective and return it to the user.Overall,this system can assist users in obtaining useful information about Xi’an tourism efficiently.
Keywords/Search Tags:Similarity calculation, Text representation, Cascade model, Auxiliary task learning
PDF Full Text Request
Related items