Research On Topic Detection And Tracking Technology Based On Uyghur Public Opinion Analysis

Posted on:2019-06-13

Degree:Master

Type:Thesis

Country:China

Candidate:L Tian

Full Text:PDF

GTID:2428330566967003

Subject:Computer application technology

Abstract/Summary:

PDF Full Text Request

With the popularity of Internet and the rise of big data,all kinds of text processing technologies have also matured.Effective extraction of information from a large number of texts has great potential value in many fields such as business,society and life.There are many ethnic groups in China.Some of them,such as the Uygur,live in a relatively concentrated way.They have their own ethnic customs and languages.Through the statistics and analysis of the forum and news website of minority languages,it can be more quickly and effectively learn the trend of local public opinion,and support the guidance and policy of public opinion in the future.The paper first introduces the technology and theory applied in topic detection,and then combines the features of data from various websites and forums in Xinjiang and selects appropriate technical means to model them,and then calculates text similarity and clustering by text clustering algorithm.In order to make the clustering results of multiple batches comparable,a large number of word vectors are trained as "base quantities" before training for different text.After determining the topic center,we use the similarity to track the topic directly.In this paper,we have made some improvements to the Doc2 VecC model,so that we can directly compare the similarity of text vectors produced in different experiments.Topic detection and tracking is a dynamic process,and it is necessary to calculate the feature vector of the text after the text have obtained,which requires all the experiments to have the same "background".We propose that �base quantity� has used as a fixed word vector in all experiments,and we assign random vectors to new words that are not in the basic quantity.Experimental results show that the text feature vector claimed by the Doc2 VecC model is superior to the traditional feature vectors generated by the Tf-idf algorithm.Using the text vector experiment after the "base quantity",it can guarantee the correlation of the results of many experiments,and calculate the similarity of the center vector of hot topic.It can basically determine whether the later text is a hot topic.

Keywords/Search Tags:

Topic, Text feature vector, Clustering, Uygur language

PDF Full Text Request

Related items

1	Research On The Construction Method Of Technology Domain Thematic Library Based On Multilevel Topic Vector
2	Text Classification Based On Word Vector And Topic Vector
3	Design And Implementation Of News Hot Topic Discovery System Based On Multi-Class Text
4	Research On Multi-Level Topic Clustering Based On Cross Degree
5	The Design And Implementation Of The Hot Education News Topic Detection System
6	Research On Bilingual Topic Model And Its Algorithm In Cross-language Information Retrieval
7	Study On Text Clustering Based On Topic Sentence Vector Model
8	Research Of Automatic Summarization Oriented To News Text
9	A SOM-based Text Clustering And Apply To Search Result
10	A Research On Text Vector Representations And Modelling Based On Neural Networks