Font Size: a A A

Mongolian Short Text Semantic Similarity Calculation Based On Deep VAE Integrated With Topic Information

Posted on:2019-12-14Degree:MasterType:Thesis
Country:ChinaCandidate:Y J FuFull Text:PDF
GTID:2428330563456743Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
The text information,a kind of convenient and quick carrier,occupies a large part of Web information resources.And its number is growing exponentially.Mongolian is one of the most representative minority languages in China.In recent years,the resources of the Mongolian website have become increasingly rich,and the number of Mongolian short texts has also increased rapidly.What followed is the much more attention for Intelligent automatic processing of Mongolian texts received from scholars.Short text similarity calculation plays an important role in text processing,and it is one of the core problems of automated information processing.And one of the urgent problems should be solved in the calculation of short texts similarity is to learn the representations of Mongolian short texts accurately.A weakness of existing semantic representation method for Mongolian texts is that it cannot accurately capture text semantic features due to the lack of adequate contextual information,whose degree of similarity reflects the similarity of short texts to some extent in the similarity calculation task.In order to solve the problem of missing context for Mongolian short texts,this paper integrates the topic information as context information into the VAE model and propose a variational auto-encoder model integrated with topic information called TVAE.This work solves the lack of context to a certain extent and learn better representation for Mongolian short text.Because the VAE model learns the probability distribution from sample data,it can learn the semantic information of Mongolian short text more accurately than other algorithms.In this paper,we use NMF model and LDA model respectively to extract topic information.The two topic models are combined with VAE to represent Mongolian short texts,and we make clustering analysis for similarity calculation on an corpus with 200,000 Mongolian short text.In the experiment,the effects of stop words and affixes,vector dimension and network depth of the model are analyzed,and TVAE model with best parameters are compared with other models.From our empirical studies,the TVAE model significantly improves the accuracy and clustering results of the similarity calculation of Mongolian short texts.And the lack of context issue for Mongolian short texts is solved to a certain extent.
Keywords/Search Tags:semantic representation, Variational Auto-encoder, topic model, semantic similarity, Mongolian processing
PDF Full Text Request
Related items