Font Size: a A A

Deep Learning-based Long Text Matching

Posted on:2022-01-04Degree:MasterType:Thesis
Country:ChinaCandidate:C W XuFull Text:PDF
GTID:2518306572483074Subject:Computer software and theory
Abstract/Summary:PDF Full Text Request
In recent years,with the maturity of deep learning technology,it has been widely used in various research fields of natural language processing,such as long text matching,which has always been one of the important research contents in the field of natural language processing.How to achieve accurate and efficient automatic matching of long texts has always been a research hotspot in academia and industry.However,most of the existing research work only focuses on the modeling of the text sequence relationship,and ignores the semantic logical matching of different semantic units in the long text,resulting in low text matching accuracy,especially when the text to be matched is very different in text length.It is easy to cause difficulties in aligning the semantic units of text pairs,which further exacerbates the difficulty in matching long texts.In order to solve the difficulties and challenges in long text matching,this article focuses on the following three problems:(1)How to encode semantic units with different granularities embedded in long text;(2)How to realize the different representations of semantic units Long text semantic coding;(3)How to integrate external knowledge to realize the semantic knowledge matching model involved in long text matching.Aiming at the above three problems,this paper innovatively proposes a long text matching learning framework,which first constructs the topic distribution involved in the long text pair,and uses the short text sequence as the smallest semantic unit to map the short text.The sequences involved in the text pair are for different topics,and then the local similarity of the long text pair(ie,the relevance of the text pair in the short sequence to different topics)is calculated,and the global relevance of the text pair is considered(based on the sentencelevel text)Similarity matching),and finally convey the information.The fusion method effectively aggregates two similarities(local and global similarities)to obtain the similarity of text pairs.In addition,consider introducing external knowledge into the model to calculate the similarity of the semantic knowledge involved in sentence pairs,and use a representation-based learning model to model the knowledge involved in the text,and integrate it into the sentence-level embedding.The representation is obtained by enhancing the text pair.Match similarity at the knowledge level.Through experiments on real Chinese news data sets(CNSE and CNSS),the results show that the topic-based long text splitting logic and graph structure-based representation method proposed in this paper enhance the model's ability to summarize text semantics through knowledge fusion.This indicates that it can further give the model the ability to capture long text knowledge and information,and has higher accuracy than previous long text matching models.
Keywords/Search Tags:natural language processing, text matching, knowledge graph and semantic enhancement, graph neural network
PDF Full Text Request
Related items