Research On Text Representation Model And Similarity Calculation Algorithm

Posted on:2021-04-06

Degree:Master

Type:Thesis

Country:China

Candidate:J Jiang

Full Text:PDF

GTID:2428330611470907

Subject:Computer application technology

Abstract/Summary:

PDF Full Text Request

Text representation and text similarity computation are the most important tasks in the natural language process,and they can provide technological supports to the computation of subsequent tasks.In this thesis,the sentence embedding model and the text similarity computation algorithm are researched.The main contents of the thesis are as follows:1.Aiming at the insufficient problems of the semantic information in the sentence embedding,a model is proposed based on feature contribution to represent sentence.The model introduces an improved information gain formula before computing sentence embedding,which combines the intra-and inter-class word frequency to construct a feature contribution factor.That is used to remove feature words with low contribution to the task.Finally,a sentence embedding with more accurate information is obtained.The experimental results show that the sentence embedding model proposed get higher accuracy on the two basic tasks not only text classification but also text similarity calculation,which verifies the effectiveness of the model.2.Most similarity algorithms only consider either the semantic information or the structural information of the text to compute the similarity.The thesis proposes a multi-model weighted fusion text similarity computation method,which aims to improve the accuracy of the text similarity algorithm by combing the advantages of multiple similarity algorithms.Firstly,on the basis of the word mover distance algorithm,the thesis constructs multi-featured fusion weights to further mine the semantic and context information of the text,and proposes a text similarity algorithm based on multi-featured fusion weights.Secondly,the hierarchical IIG-SIF similarity algorithm is used to employ the spatial structure information in the text.Finally,a linear-weighted model is established to combine these two similarity computation results,which effectively improves the accuracy of the text similarity algorithm.The controlled experiment shows that the algorithm can effectively improve both the Word Mover Distance algorithm and the IIGSIFSim algorithm,and is superior to the classic algorithms.The method can effectively extract the semantic information of the text,and find the relationship between the word order and the spatial structure in the text,and improve the accuracy in the text similarity.

Keywords/Search Tags:

PDF Full Text Request

Related items

1	Research On Internet Short Text Message Oriented Multi-Document Automatic Summarization
2	Computational Methods Of Sentence Distance Based On Multi-modal Word Embedding
3	Chinese Sentence Similarity Computation Based On Multi-features Fusion
4	Research On Text Similarity Algorithm Based On WMD Distance
5	Research On Sentences Similarity Computation Based On Multi-information Fusion
6	Research On Semantic Expression Based On Knowledge Source Embedding And Multi-modal Data Fusion
7	Research On Multi-label Classification Method Of Chinese Short Text Based On Multi-dimensional Feature Fusion
8	Research Of Text Quality Analysis Algorithm Based On Deep Learning
9	A Sentiment Analysis Model Based On Joint Training Of Multi-Type Word Embedding Fusion And Semantic Cosine Distance Feature Fusion
10	The Design And Implementation Of Multi-features Combination In Sentence Similarity Computation