Font Size: a A A

Supervised Wasserstein Distance And Fast Neighbor Component Analysis's Text Distance Measurement

Posted on:2021-06-17Degree:MasterType:Thesis
Country:ChinaCandidate:Y F XiaFull Text:PDF
GTID:2518306107459424Subject:Mathematics
Abstract/Summary:PDF Full Text Request
With the rapid development of the Internet era,a large amount of text information is flooded with various social platforms.How to fully mine these text information has become an urgent problem to be solved.As an important component of text mining,text distance measurement has become the focus of people's attention.As a way of measuring difference between probability distributions,wasserstein distance is effectively applied to the case of non overlapping probability distributions because its continuous,which is suitable for many statistical learning models.As a way of metric learning,fast neighbor component analysis can effectively learn the geometric characteristics of distribution by defining loss function from the perspective of probability,which make it attracted much attention.In this paper,we mainly improve supervised wasserstein measure and fast neighbor component analysis algorithm,and apply them to the text distance measurement.First,we propose two ways to solve the approximate solution of wasserstein measure.One is shown that the noise obeying the normal distribution is added to the constraint condition to disturb,at the same time,we add the regularization term,to transform the simplex solution of linear programming into the convex optimization problem for approximate solution;the other is to modify the wasserstein measure after entropy regularization,add regularization and use the sinkhorn iterative algorithm to find the approximate solution of the original problem.Then,based on the fast neighbor component analysis,this paper transforms the basic euclidean distance to wasserstein distance,and adds the regularization term of wasserstein distance to get a new gradient update method,to prevent ovefitting and speed up the calculation.Finally,we combine two measures to get a new algorithm,called SL-WMD algorithm,which is more suitable for text distance measurement in word2 vec semantic case.In order to verify the effect of the new algorithm,we apply it to bbcsport dataset for text classification,and compare it with SWMD model.
Keywords/Search Tags:Wasserstein measure, Fast neighbor component analysis, Text distance, SL-WMD algorithm, Regularization
PDF Full Text Request
Related items