Mongolian Short Text Semantic Similarity Calculation Based On Deep VAE Integrated With Topic Information

Posted on:2019-12-14

Degree:Master

Type:Thesis

Country:China

Candidate:Y J Fu

Full Text:PDF

GTID:2428330563456743

Subject:Computer Science and Technology

Abstract/Summary:

PDF Full Text Request

The text information,a kind of convenient and quick carrier,occupies a large part of Web information resources.And its number is growing exponentially.Mongolian is one of the most representative minority languages in China.In recent years,the resources of the Mongolian website have become increasingly rich,and the number of Mongolian short texts has also increased rapidly.What followed is the much more attention for Intelligent automatic processing of Mongolian texts received from scholars.Short text similarity calculation plays an important role in text processing,and it is one of the core problems of automated information processing.And one of the urgent problems should be solved in the calculation of short texts similarity is to learn the representations of Mongolian short texts accurately.A weakness of existing semantic representation method for Mongolian texts is that it cannot accurately capture text semantic features due to the lack of adequate contextual information,whose degree of similarity reflects the similarity of short texts to some extent in the similarity calculation task.In order to solve the problem of missing context for Mongolian short texts,this paper integrates the topic information as context information into the VAE model and propose a variational auto-encoder model integrated with topic information called TVAE.This work solves the lack of context to a certain extent and learn better representation for Mongolian short text.Because the VAE model learns the probability distribution from sample data,it can learn the semantic information of Mongolian short text more accurately than other algorithms.In this paper,we use NMF model and LDA model respectively to extract topic information.The two topic models are combined with VAE to represent Mongolian short texts,and we make clustering analysis for similarity calculation on an corpus with 200,000 Mongolian short text.In the experiment,the effects of stop words and affixes,vector dimension and network depth of the model are analyzed,and TVAE model with best parameters are compared with other models.From our empirical studies,the TVAE model significantly improves the accuracy and clustering results of the similarity calculation of Mongolian short texts.And the lack of context issue for Mongolian short texts is solved to a certain extent.

Keywords/Search Tags:

semantic representation, Variational Auto-encoder, topic model, semantic similarity, Mongolian processing

PDF Full Text Request

Related items

1	Research On Neural Topic Modeling Method Based On Variational Auto-Encoder
2	Research On Semantic Representation Of Text Based On Topic Model
3	Research And Application Of Representation Learning Based On Variational Auto-encoder
4	Research On Topic Modeling Method Based On Semantic Distribution Similarity
5	The Research On Chinese Sentential Semantic Model Parsing And Text Representation
6	Research On The Homonyms Disambiguation Algorithm Based On Mongolian Nouns Semantic Network
7	Text Semantic Similarity Algorithm Based On Transformer
8	Variational Auto-Encoder Based Attributed Network Representation Learning And Deep Embedded Clustering
9	Image Hashing Retrieval Based On Auto-Encoder
10	Image And Text Joint Modeling Method Based On Multimodal Weibull Variational Auto-Encoder