Font Size: a A A

Hantai Bilingual News Topic Discovery Method Research

Posted on:2017-07-02Degree:MasterType:Thesis
Country:ChinaCandidate:J P ZhangFull Text:PDF
GTID:2358330488465627Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
Along with the time Internet plus big data coming and globalization, the number of Internet users in the country soared, between China and ordinary people through the Internet to access real-time news hot spots become the norm in different language users within a certain period of time can get their information through their own language, but to know about other countries the need of the news media or by means of relevant language knowledge as a translation tool, the effects of different language user communication, therefore, cross language news has become a hot topic detection method of natural language processing research hot issues in the field of. This paper according to Chinese -Thai bilingual language features around based on cross language corpus of Chinese and Thai word similarity computing, Chinese and Thai bilingual entity alignment method and the Chinese and Thai bilingual news topic discovery method in the research of, mainly to complete the following characteristics research work:(1) the study of the word distributed representations of Chinese and Thai problems, Chinese and Thai language features and news analysis to describe the characteristics, by weakly supervised learning expansion way to generate cross linguistic data, the data the Thai noun, verb seen as special Chinese nouns, verbs, words in both languages in the same neural probabilistic language model iterative learning word distribution and get the final model of Chinese and Thai word distribution can reflect the words between Chinese and Thai cosine similarity.(2) bilingual entity alignment issues on Chinese and Thai, this paper mainly studies the alignment of entity names of people and location. Three entity alignment method is proposed in this paper. Firstly, the paper proposes the similarity of bilingual entity fuzzy matching problem. Secondly, using bilingual entity word sequence pattern similarity proposed Chinese entity model to match Thai entity method. Then the knowledge information consistency of Thai and Chinese entities, by Apriori algorithm mining Chinese entity of knowledge information words, build a naive Bayes bilingual entity alignment model of than Chinese and Thai corpus of named entity alignment. In the end, the rules combine the advantages of the three methods to achieve the best results.(3) Study of Chinese and Thai bilingual news topics detection in the work of a method based on work one and work two, is proposed based on maximum clique clustering bilingual topic discovery method, and the experiment will be the method with the existing cross lingual topic discovery methods are compared, show that the maximum clique clustering can achieve good results.
Keywords/Search Tags:Chinese, Thai, weakly supervised extended learning, cross language word distribution, entity alignment, maximum clique, cross language topic detection
PDF Full Text Request
Related items