The Research And Implementation Of Text Similarity Computing Based On Topic Model

Posted on:2013-06-27

Degree:Master

Type:Thesis

Country:China

Candidate:C N Sun

Full Text:PDF

GTID:2248330371499433

Subject:Computer software and theory

Abstract/Summary:

PDF Full Text Request

The Internet has developed to the mobile Internet age, it is not only the traditional PC can browse the Internet, cell phones, tablet PCs and other mobile devices can access the Internet. Computer information processing has entered the age of big data. These data, many of which are the form of text, such as Google search logs, Twitter and micro blogging daily updated data, Facebook, and Tencent daily user-generated data, etc., these data were not GB level, but TB level of the data. How to analyze these huge data to help corporate decision-making or improve the user experience is the main problem. The main work of this paper is text similarity computation; the main research work is to investigate the similarity of a robust method of calculating the widest possible range of applications. At the very start we introduced the vector space model and its problems, and then to explore some solutions for these problems, the main work is as follows:First, a brief introduction to the basic principles of the vector space model and similarity calculation method based on vector space model. In the same way briefly introduced the topic model, as well as topic-based model of similarity calculation method. And detailed collection of significance and algebraic significance of the topic model can be seen from the main model compared with the vector space model, have a richer mathematical and statistical basis.Second, we briefly introduced the LSI, pLSI, the LDA model and their parameter estimation method. The theme of the model after the LDA method is only just emerging, this paper introduces some research progress for the topic model, the main progress which variables to add a new observation for the characteristics of the task, as well as the introduction of semantic information by three aspects.This article describes a based on the pLSI word co-occurrence clustering algorithms, and modeling on the basis of co-occurrence phrase text, that text phrase is now more of its similarity the greater the similarity based on the assumption that the establishment of algorithm in the experimental verification is valid.Finally, the Chinese text modeling method based on the LDA model, the experimental Gibbs sampling algorithm to draw the theme of the text space and the text of the topic space for the similarity, the use of the JS distance to measure the similarity of the text, experiments show that the method better than traditional methods based on vector space model.

Keywords/Search Tags:

PDF Full Text Request

Related items

1	Research On Short Text Topic Discovery Based On BTM Topic Model
2	Research On Fast Gibbs Sampling Topic Inference Algorithms For Topic Models
3	Text Semantic Mining Based On Topic Model
4	The Research And Implementation Of Topic Evolution Based On LDA
5	Reasearch On The Topic Clustering Of Network Short Text
6	Research On Topic Models Combining Internal Feature And External Information Of Texts
7	Research On Learning Methods Based On Topic Model And Its Application In User Portraits
8	Research On Semi-supervised Topic Model For Text Classification
9	Research On Extracting Speech Topic Based On Topic Model
10	MCMC Method And Its Application In The Text Topic Modeling