Research And Improvement Of Text Similarity Calculation Method

Posted on:2022-10-16

Degree:Master

Type:Thesis

Country:China

Candidate:Y Lin

Full Text:PDF

GTID:2518306539498054

Subject:Engineering

Abstract/Summary:

PDF Full Text Request

The continuous development and application of information technology in the current society has attracted more and more attention,and information technology has also facilitated the lives of the masses to a large extent.Technology-related applications such as big data and artificial intelligence are gradually appearing in the public’s field of vision.As a result,people’s needs continue to increase,and people need to extract the information they need from the massive Internet data.Therefore,researchers have applied artificial intelligence technology to the field of natural language processing,and a series of applications such as automatic summarization,document duplicate checking,text classification and clustering,and automatic question and answer system have appeared,which greatly facilitates people’s lives.Applications involve the calculation of text similarity.The work of this paper includes four aspects:First,the hybrid similarity calculation model is proposed.In order to improve the accuracy of Chinese short text similarity calculation,a new Chinese short text similarity calculation method based on hybrid strategy is proposed.First,according to the semantic distance of words,hierarchical clustering is used to construct a short text clustering binary tree,which improves the traditional vector space model and calculates the text similarity weighted by keywords.Then,by extracting the main components of the sentence,the traditional method based on the grammatical and semantic model is improved to obtain the semantic similarity of the main text.Finally,the two similarities are weighted to calculate the final text similarity.Experimental results show that this method is more accurate in calculating the similarity of short texts.Second,the calculation of text similarity based on the BERT model.BERT is a pre-trained language model.The BERT model learned general language knowledge in the pre-training stage,and fine-tuned the contextual representation of the words learned on the current task.In order to verify the effectiveness of BERT,the BERT model is compared with the traditional model.The experimental results show that,compared with the traditional system method,the BERT method has better results.Third,the Attention-Bi LSTM-BERT model is proposed.On the basis of previous research,BERT is used to train word vectors,and the attention mechanism Bi LSTM network is introduced into the model to extract text features.Compared with other deep learning models,the semantic expression of the text is more accurate,and finally through the public data Experiments on the set and comparison with other models have verified that the model has higher performance.Fourth,based on the basic algorithm research in Chapter 5,a text similarity calculation system is designed and implemented.The system can be used in daily life document duplicate checking,text classification and clustering,etc.It has certain practical value.

Keywords/Search Tags:

PDF Full Text Request

Related items

1	Research On Suggested Sentence Recognition And Suggested Information Extraction
2	Research On Automatic Question Answering Technology Based On Attention Mechanism
3	Research On Short Text Similarity Algorithm Based On BiLSTM And Attention Mechanism
4	A Sentence Representation Method Based On Syntax And Semantic
5	Research On Implicit Text Emotion Analysis Based On BERT And Deep Neural Network
6	Research On Commodity Title Similarity Based On WM-CBOW And Bert Model
7	Sentence-embedding And Similarity Via Hybrid Bidirectional-LSTM And CNN Utilizing Weighted-pooling Attention
8	Research On Sentence Similarity Calculation Method Of Fusion Knowledge
9	Sentence Representation Research Based On Attention Mechanism
10	Design And Implementation Of Sentence Level And Paragraph Level Semantic Similarity Algorithms