Research And Application Of Similarity Calculation In Mixed Long And Short Texts

Posted on:2022-11-19

Degree:Master

Type:Thesis

Country:China

Candidate:C L Xu

Full Text:PDF

GTID:2518306764467554

Subject:Computer Software and Application of Computer

Abstract/Summary:

PDF Full Text Request

Textual similarity calculation is one of the most important tasks of natural language processing.The development of social media has led to the increase in the number of short texts.And it results in the mix of long and short texts.It is urgent to solve the problem in similarity calculation of mixed long and short texts.Most of the existing studies focus on texts with little difference in length,including three types of models:representation structure,interaction structure and pre-training structure.The representation structure and the interaction structure use the same feature extractor,which cannot capture the differences of feature between long and short texts.The pre-training structure lacks the interactive features.Most of the existing search systems rely on matching by word segmentation without considering semantic relevance.To solve these problems,this thesis mainly focuses on the following three aspects.(1)In this thesis,a model which can calculate the similarity of mixed long and short texts by using a pseudo-siamese network is designed This method uses two different feature extractors to extract the features of long texts and short texts respectively.The feature extractor for long texts is Longformer,which avoids the problem of information loss introduced by splitting the long texts and reduces the calculation of the attention mechanism.The feature extractor for short texts fuses Bi LSTM and ABCNN dualchannel features.This method overcomes the difference in timing features and feature quantity between long texts and short texts,and improves the accuracy of textual similarity calculation.(2)In this thesis,a pre-training model with interactive features is designed to calculate the similarity between mixed long and short texts.This method combines the advantages of interaction structure and pre-training structure.It uses Transformer-XL to solve the long dependency problem of long texts.The permutation language model is used to represent texts and extract interactive features at the same time.This method adds GRU layer to learn text features deeply.It adds residual network to avoid the problem in network degradation.Therefore,it further improves the accuracy of the similarity calculation between mixed long and short texts.(3)In this thesis,a news search system with semantic matching is designed and developed.Based on the interactive pre-training model of textual similarity calculation in mixed long and short texts,it improves the semantic relevance between search targets and search results.After analysing the system requirements in detail,it has summary design,detailed design and database design.After that,a system is developed which can be used for analysing news on the Internet intelligently.It has user module,data management module,search module and data analysis module.It passes the function tests and performance tests of system.The system can run steady.The first model of similarity calculation designed in this thesis compensates for the shortcomings of the representation structure,and the second combines the advantages of the interaction structure and the pre-training structure.Both of which achieve high accuracy in calculating the similarity of mixed long and short texts.The news search system with semantic matching also provides the function of relevant data analysis.And it has some reference value in the field of public opinion analysis.

Keywords/Search Tags:

Text Representation, Text Similarity, Pseudo-siamese Network, Interactive Features, Pre-training Model

PDF Full Text Request

Related items

1	Similar Text Discrimination Based On Siamese Network
2	Research On Text Representation And Text Classification Method Based On Adversarial Training
3	Research On Text Representation Model And Application In Text Classification And Natural Language Inference
4	Research On Text Representation Based On Siamese Neural Network And Hybrid Neural Network
5	Research On Text Similarity Recognition Based On LSTM
6	Study On Similarity-based Text Clustering Algorithm And Its Application
7	Research On Image Generation Algorithm Based On Text Semantics
8	Research On Question Similarity Of Intelligent Question Answering System Based On Deep Text Matching Model
9	The Study Of Measures And Applications Of Short Text Semantic Similarity
10	Research On Semantic Similarity Calculation Method And Data Augmentation In Chinese Short Text