Font Size: a A A

Research On Topic Detection And Tracking Technology Based On Uyghur Public Opinion Analysis

Posted on:2019-06-13Degree:MasterType:Thesis
Country:ChinaCandidate:L TianFull Text:PDF
GTID:2428330566967003Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
With the popularity of Internet and the rise of big data,all kinds of text processing technologies have also matured.Effective extraction of information from a large number of texts has great potential value in many fields such as business,society and life.There are many ethnic groups in China.Some of them,such as the Uygur,live in a relatively concentrated way.They have their own ethnic customs and languages.Through the statistics and analysis of the forum and news website of minority languages,it can be more quickly and effectively learn the trend of local public opinion,and support the guidance and policy of public opinion in the future.The paper first introduces the technology and theory applied in topic detection,and then combines the features of data from various websites and forums in Xinjiang and selects appropriate technical means to model them,and then calculates text similarity and clustering by text clustering algorithm.In order to make the clustering results of multiple batches comparable,a large number of word vectors are trained as "base quantities" before training for different text.After determining the topic center,we use the similarity to track the topic directly.In this paper,we have made some improvements to the Doc2 VecC model,so that we can directly compare the similarity of text vectors produced in different experiments.Topic detection and tracking is a dynamic process,and it is necessary to calculate the feature vector of the text after the text have obtained,which requires all the experiments to have the same "background".We propose that “base quantity” has used as a fixed word vector in all experiments,and we assign random vectors to new words that are not in the basic quantity.Experimental results show that the text feature vector claimed by the Doc2 VecC model is superior to the traditional feature vectors generated by the Tf-idf algorithm.Using the text vector experiment after the "base quantity",it can guarantee the correlation of the results of many experiments,and calculate the similarity of the center vector of hot topic.It can basically determine whether the later text is a hot topic.
Keywords/Search Tags:Topic, Text feature vector, Clustering, Uygur language
PDF Full Text Request
Related items