Font Size: a A A

Research And Improvement Of Text Similarity Calculation Method

Posted on:2022-10-16Degree:MasterType:Thesis
Country:ChinaCandidate:Y LinFull Text:PDF
GTID:2518306539498054Subject:Engineering
Abstract/Summary:PDF Full Text Request
The continuous development and application of information technology in the current society has attracted more and more attention,and information technology has also facilitated the lives of the masses to a large extent.Technology-related applications such as big data and artificial intelligence are gradually appearing in the public's field of vision.As a result,people's needs continue to increase,and people need to extract the information they need from the massive Internet data.Therefore,researchers have applied artificial intelligence technology to the field of natural language processing,and a series of applications such as automatic summarization,document duplicate checking,text classification and clustering,and automatic question and answer system have appeared,which greatly facilitates people's lives.Applications involve the calculation of text similarity.The work of this paper includes four aspects:First,the hybrid similarity calculation model is proposed.In order to improve the accuracy of Chinese short text similarity calculation,a new Chinese short text similarity calculation method based on hybrid strategy is proposed.First,according to the semantic distance of words,hierarchical clustering is used to construct a short text clustering binary tree,which improves the traditional vector space model and calculates the text similarity weighted by keywords.Then,by extracting the main components of the sentence,the traditional method based on the grammatical and semantic model is improved to obtain the semantic similarity of the main text.Finally,the two similarities are weighted to calculate the final text similarity.Experimental results show that this method is more accurate in calculating the similarity of short texts.Second,the calculation of text similarity based on the BERT model.BERT is a pre-trained language model.The BERT model learned general language knowledge in the pre-training stage,and fine-tuned the contextual representation of the words learned on the current task.In order to verify the effectiveness of BERT,the BERT model is compared with the traditional model.The experimental results show that,compared with the traditional system method,the BERT method has better results.Third,the Attention-Bi LSTM-BERT model is proposed.On the basis of previous research,BERT is used to train word vectors,and the attention mechanism Bi LSTM network is introduced into the model to extract text features.Compared with other deep learning models,the semantic expression of the text is more accurate,and finally through the public data Experiments on the set and comparison with other models have verified that the model has higher performance.Fourth,based on the basic algorithm research in Chapter 5,a text similarity calculation system is designed and implemented.The system can be used in daily life document duplicate checking,text classification and clustering,etc.It has certain practical value.
Keywords/Search Tags:Sentence similarity, BERT, BiLSTM, Attention mechanism
PDF Full Text Request
Related items