Research On Extracting Speech Topic Based On Topic Model

Posted on:2016-12-20

Degree:Master

Type:Thesis

Country:China

Candidate:Q Tang

Full Text:PDF

GTID:2308330461955989

Subject:Control Science and Engineering

Abstract/Summary:

This paper studies the process of speech topic extraction:mainly by data of speech preprocessing, text representation, feature extraction, parameter estimation, model training and topic classification and through the Gibbs-LDA++and libsvm environment platform to realize the simulation of the model.Data of speech preprocessing mainly includes the transformation of speech, division of words, remove stop words and word frequency statistics. Speech conversions are used to get the text data, and by ICTCLAS to divide words and remove the stop, in order to reduce interference without words and reduce the amount of data. After dividing words and removing the stop, we do words frequency statistic to convenient and at the back of the handle, as well as the weights to the word given.Text representation and feature extraction are relationship with performance of computer dealing with data and data extraction. We use vector space model to express the text. It is natural language processing commonly used models and has a reliable theoretical support. Feature extraction is improved by the method of Ï‡2 statistics. It mainly use the relationship between the feature and categories to decide and avoid the loss of important information.After feature extraction, we need do parameter estimation and model training on the feature set. Parameter estimation provides the necessary three parameters for the LDA model. The necessary three parameters are Ï†,Î² and T.Ï† and Î² cannot be directly get in LDA. They can only be getting through some approximate algorithm. Thus, we use the Gibbs sampling to get in the MCMC. T is a topic for the size of the value and need to set. But how much value is the best? By optimizing the DBSCAN algorithm, we use sample density to determine the relationship between different topics to choose the optimal number of topics. It implements the performance and reduces the number of iterations. The parameters to be obtained, we need train the LDA model and let the model generate a hidden topic-text matrix for SVM.Finally, by the Gibbs-LDA++ and libsvm environment platform, we do the extraction experiments of Chinese and English speech data. Comparing the experimental results and the performance evaluation methods, we can clearly demonstrate to speech topic extraction based on the topic model is superiority and effectiveness.

Keywords/Search Tags:

LDA model, topic extraction, Gibbs sampling, topic

Related items

1	Research On Extracting Speech Topic Based On Topic Model
2	Research On Short Text Topic Discovery Based On BTM Topic Model
3	Topic Mining And Prediction From Microblogs Based On Topic Model
4	Research And Implementation On Large-Scale Distributed LDA Topic Model
5	Research And Implementation Of Personalized News Recommendation Algorithm Based On Improved LDA Topic Model
6	The Research And Implementation Of Topic Evolution Based On LDA
7	Research On Topic Models Combining Internal Feature And External Information Of Texts
8	An Efficient Keywords Extraction Algorithm For Text Comprehension
9	Online Belief Propagation Algorithm Research Of Topic Models
10	Research On Learning Methods Based On Topic Model And Its Application In User Portraits