Font Size: a A A

Research On The Method And Technique Of Chinese And Thai Cross - Language Topic Detection

Posted on:2016-07-28Degree:MasterType:Thesis
Country:ChinaCandidate:J ShiFull Text:PDF
GTID:2208330470470610Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
With the rapid development of information society, it has been increasingly used to get news information from the Internet, to be able to get the information we need from the mass of information, Topic Detection techniques proposed by scientists, its main purpose is to detect the topic that the news text describe. Internet language diversity makes people have not been satisfied with obtaining single language information, so the cross-language topic detection technology is increasingly attention of researchers, the cross-language topic detection technology aim at detecting text news topics in different locales.This paper firstly proposed a WordNet-based method of Chinese-Thai cross-language text similarity calculation, calculating the similarity between Chinese news text and Thai news text, get the similar Chinese text and Thai text, for the next step to construct the Chinese-Thai cross-language joint LDA model. First, preprocessing and feature selection for Chinese text and Thai text, then use the multi-language dictionary WordNet to convert the Chinese text and Thai text into an middle layer language, compute the text similarity between Chinese and Thai in the middle layer. Experimental results show that, this paper’s method of computing the similarity between Chinese text and Thai text has 82%’s accuracy, have a well result.Based on the Chinese-Thai similar news text, we use the LDA model to modeling the news text, to get the Chinese-Thai cross-language joint LDA model, then using the Gibbs Sampling method to solve the joint LDA model, inferring the model parameters and the text’s topic distribution. Use the joint LDA model to complete the two subtasks of the cross-language topic detection.Finally, design and analysis the experiments, then the experimental results data Verify the feasibility of this paper’s algorithm.
Keywords/Search Tags:Cross-language Topic Detection, Cross-language text similarity, WordNet, Cross-language joint LDA model
PDF Full Text Request
Related items