Font Size: a A A

A Topic Tracking Algorithm For Micro-blog Based On Micro-blog Topic Summarization

Posted on:2020-06-12Degree:MasterType:Thesis
Country:ChinaCandidate:H Z ZhuFull Text:PDF
GTID:2428330575455442Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
Due to huge user communities that can participate in discussions whenever and wherever possible and express themselves freely in micro-blog platform,it has the characteristics of fragmentation and timeliness.Therefore,users cannot quickly get the main content,causes and development results of the current topic from the complex micro-blog information.In order to solve this problem,this dissertation proposes a topic tracking algorithm based on microblog topic summary.This algorithm mainly combines the characteristics of micro-blog text and the needs of users to improve the traditional process of topic tracking(preprocessing,constructing text model,similarity analysis and topic drift detection).Firstly,a new login word extraction algorithm based on traditional statistics and N-incremental algorithm is proposed to make the pre-processed Chinese word segmentation more precise,then,the construction of topic summary and the optimization algorithm is proposed to make the text model more detailed;finally,the adaptive tracking and topic drift detection of microblog topics are realized through similarity analysis.Therefore,the main research contents of this dissertation are as follows.(1)The new login word extraction algorithm based on traditional statistics and N-incremental algorithm is proposed to remedy the shortcoming that traditional statistics alone cannot effectively extract new landing words.Firstly,frequent words with micro-blog texts are merged into traditional stop words to form a class stop words list by analyzing the left and right entropies of words with micro-blog texts.Then,the improved statistics are used to filter the spam strings while searching for frequent strings.(2)The construction and optimization algorithm for a micro-blog topic summary is proposed.Firstly,the TF-IDF value of words and word information(term information,inter-word information)are analyzed to extract microblog keywords.Then,the scale of microblog topic summary is reduced by confidence threshold,and the integrity of microblog topic summary is considered by the connectivity of sub-topics,Therefore,the optimal value of a micro-blog topic summary can be found by weighing the scale and the integrity of the expression.(3)Self-adaptive tracking and topic drift detection of micro-blog topics are realized.Firstly,time window is introduced to divide microblog text into N equal time texts,and then Nequal time texts are put into training set and test set.Then,the corresponding microblog topic summary(query summary and feedback summary)can be constructed for the equal-time text in training set and the test set Tben,the similarity between query summary and feedback summary is analyzed,and the query summary is updated by different similarity levels.The experimental results show that the topic tracking algorithm based on microblog topic summary cannot only extract accurately new landing words but also construct quickly a complete and concise microblog topic summary.Ultimately,micro-blog topics can be tracked accurately and continuously,micro-blog users can understand quickly the causes of current topics,development results and future trends.According to the characteristics of micro-blog text and the shortcomings of traditional algorithm,this dissertation improves the traditional algorithm,so that new login words can be extracted accurately from ultra-short "fragmented" text and topic summary of micro-blog can be constructed quickly.In addition,this dissertation innovatively traces topics in the form of microblog topic summary,which enables users to efficiently obtain more detailed content of topics.Figure[11]table[8]reference[64]...
Keywords/Search Tags:topic tracking, new login words, microblog topic summary, N-ary incremental algorithm, confidence threshold
PDF Full Text Request
Related items