Font Size: a A A

The Research And Application Of Unsupervised And Supervised Short Text Similarity Measure

Posted on:2019-07-08Degree:MasterType:Thesis
Country:ChinaCandidate:G W NiFull Text:PDF
GTID:2428330566996013Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
With the development and popularization of application scenarios,such as search engines,social networks and chat robots,short text similarity calculation plays an important role in the researches and applications of information retrieval,text classification,intelligent question answering and machine translation.The traditional text vectorization method is more suitable for long texts,which can not achieve satisfactory results for short texts with sparsity and high noise.In recent years,with the extensive application of neural network language model,the text information processing mode can be transformed from the traditional high-dimensional sparse vector space into the low-dimension vector space,which provides a new idea for the short text similarity calculation.This paper presents an unsupervised method for calculating the similarity of short texts.The optimal solution of the transport problem in linear programming with Earth Mover's Distance(EMD)is applied to measure the similarity between two short texts.The semantic distance of words is measured by Word2 Vec.The word position similarity is presented,that is,it considers the relative position of words in the word movement of the EMD model,which makes the final short text similarity calculation model with higher accuracy,recall and F1 in the k-Nearest Neighbor text classification task.At the same time,a supervised short text similarity calculation method is proposed.And two Convolutional Neural Networks(CNN)models with exactly the same vertical direction are constructed.The CNN model based on Word2 Vec short text semantic extension matrix is used as input CNN.After being processed by full connectivity layer,advanced semantic vectors are gotten.Finally,to determine if short texts are similar,the probabilities are computered by using Sigmoid in the active layer.The short text similarity calculation model is applied to the "Quora Question Pairs" dataset,and achieves good experimental results.In this paper,the short text similarity calculation method is applied to the operator network fault ticket inspection task.It designs a simple text quality inspection process based on short text similarity calculation,develops a work order intelligence inspection system based on the B\S structure.The actual operating results show that our system performance can better meet the routine operators of quality inspection requirements.
Keywords/Search Tags:Short Text Similarity Measures, Earth Mover's Distance, Deep Learning, Semantic Expansion, Network Operation and Maintenance
PDF Full Text Request
Related items