Font Size: a A A

Research On Text Distance Calculation Based On Orthogonal Matrix Factorization Of Heterogeneous Graphs

Posted on:2019-04-12Degree:MasterType:Thesis
Country:ChinaCandidate:Y N ZhengFull Text:PDF
GTID:2428330548969563Subject:Software engineering
Abstract/Summary:PDF Full Text Request
As a basic and important work in natural language processing,text distance calculation is undoubtedly one of research hotspots in this field.It plays a decisive role in tasks such as information query,document classification,and automatic question answering system.The simplest method in the text distance calculation model is based on the pouch method,but it does not consider the semantic relationship between phrases and phrases.That is to say,it does not recognize the synonymy and polysemy of the words,so it has limitations.In recent years,some scholars have proposed a Weighted Matrix Decomposition(WTMF)method.This method takes the missing vocabulary into account when calculating the distance of text,and it alleviates the problem of sparse data and indirectly improves the accuracy of the text distance calculation.The WTMF method also have played a vital role in the improvement of related tasks and to some extent made up for the inadequacies of traditional methods.However,this method only considers the relationship between text and words,without considering the relationship between text and text,and suppressing the excessive influence of high-frequency words on text.Moreover,duplicate information may be programmed during the iteration of the model,causing some information to appear repeatedly leading to information redundancy,and affecting the calculation of the text distance.For these reasons,this paper explores two kinds of research on text distance calculation based on the above problems:Firstly,two improvements are proposed based on the WTMF model.One is considering the relevance between words and to construct a weighted and undirected graph,and standardize word frequency and word weight in order to suppress the excessive influence of high-frequency words.The other is to transform the matrix through orthogonalization,which makes the repeated information in the process of model iteration out to make it have a better discrimination.Experiments were conducted on an authoritative data set,and the results were analyzed and compared.As a result,the method proposed in this paper is much more effective than the WTMF method.Secondly,this paper applies the improved model to heterogeneous media link tasks and news microblog abstract tasks.In these two tasks,the method of merging textual distance calculations is applied in experiments on authoritative datasets,which shows that the proposed method has achieved remarkable results.
Keywords/Search Tags:WTMF, text distance, orthogonal decomposition, heterogeneous map
PDF Full Text Request
Related items