Research On Multi-source Text Topic Mining Algorithm

Posted on:2020-08-24

Degree:Master

Type:Thesis

Country:China

Candidate:L Y Xu

Full Text:PDF

GTID:2428330596473185

Subject:Computer Science and Technology

Abstract/Summary:

PDF Full Text Request

With the rapid development of the Internet,people need to obtain text information from various network channels every day.Therefore,the processing of text information from multiple sources has become a very important task.Most of the traditional topic mining models are designed for single-source text data.For the existing text data sources,the traditional model is difficult to effectively apply to such multi-sources due to the more complex form of data.Text data from different sources has certain similarities in the distribution of topic information,but there are obvious differences in the vocabulary features of the theme.The traditional models can't make good use of the relevance of the topic knowledge of multi-source data.It is difficult to resolve the differences in the representation of the same topic in different sources.In order to better understand information of multi-source text,we propose a novel topic model for multi-source text data based on Dirichlet Multinomial Allocation model,namely MSDMA.It has three main advantages: 1)learning topic information from several sources at the same time,with discovering the potential relationship between each source on the topic knowledge,and retaining the difference in vocabulary performance of the topic in different sources.2)Through the transfer learning method,under the fusion of different quality data sources,improve the topic discovery effect of low-quality sources with high noise and low information;3)Ability to learn the number of themes in each source autonomously.This is more adaptive to the multi-source than the traditional artificially set method.Based on MSDMA,the ?-MSDMA model is designed.The modeling process of the model is mainly divided into two parts.First,the MSDMA model is trained on a part of the data set.After the training is completed,the priori parameters of the topic-word distribution are updated to a new one.Then the new priori parameters are applied to the new set to enable the model to more accurately describe the observed data,and to get faster and more effective topic discovery.Finally,through large-scale experiments on simulated data sets and real data sets,we prove that our method can more effectively mine the topic of multi-source text than traditional mainstream methods.

Keywords/Search Tags:

multi-source text, Dirichlet Multinomial Allocation model, topic model, Text mining

PDF Full Text Request

Related items

1	Research On Text Mining Based On Topic Model
2	Research And Implementation Of Distributed Topic Clustering Technology For Text Flow
3	Research And Application Of Topic Evolution Model Based On LDA
4	Model-based Algorithms For Text Clustering
5	Study Of Text Evolution Analysis And Prediction Based On Topic Model
6	Research On Topic Model Based Patent Mining And Its Applications
7	Research And Application Of Text Classification Model Based On Topic Model
8	Research On Classification Algorithm Of Scientific Papers Based On Topic Model
9	Chinese Text Classification Method Based On Improved Topic Model
10	Research Of Weibo Topic Detection Model Based On Dirichlet Regression