Font Size: a A A

Design And Implementation Of Microblog Information Visualization System

Posted on:2020-06-17Degree:MasterType:Thesis
Country:ChinaCandidate:J H LiFull Text:PDF
GTID:2428330578952716Subject:Software engineering
Abstract/Summary:PDF Full Text Request
With the development of the mobile Internet,users of microblog social platforms are exponentially increasing.According to Sina Weibo's second-quarter earnings statistics last year,its monthly active users have reached 431 million.Faced with the massive microblog text set,how to visualize and analyze the search results accurately and efficiently is an urgently needed function currently,according to the query demand of keywords,topics,etc.Currently,microblog search engines can only return the results in the form of a document list.It is the biggest challenge of microblog search engines that how to visually process all search results according to query demand.For this challenge,we designed a microblog information visualization system,the main research work includes the following two parts:First,topic trend and region analysis:firstly,we use the WebCollector crawler to download the original post of microblog,and preprocess the posts by the rules.Then we use the Chinese IK tokenizer plugin with the custom extension dictionary to process the word segmentation,disambiguation,and stop words for each short text in the corpus.Meanwhile,we can generate the corresponding inverted record table and dictionary.Then,ELK technology stack is used to build a distributed index library and search engine platform,and we store all data in the inverted record table and dictionary in the platform.Finally,we get the result set returned by the retrieval system,and the visualization component Kibana is used to do data statistical analysis and'visualization processing for the query result set.Through this function,which can help users quickly get the user's geographical distribution,topic development trend and degree of concern from the result set.Second,topic clustering:firstly,we use the retrieval function to query all text result sets related to the target topic;then,we use the Lucene framework to generate a corresponding inverted index based on the returned text result set,and the process mainly includes word segmentation,ambiguity processing and removal of stop words;Then we use the SVD clustering algorithm in Lingo to generate several subtopic result sets by using a top-down approach;finally,we use the Carrot2 visualization component to visualize the subtopic result set according to the number of subtopics,clustering dispersion,and text score threshold.Through this function,users can adjust clustering parameters to help them further clarify the query demand and obtain query results more efficiently.
Keywords/Search Tags:visualization, topic clustering, microblog text, ELK technology stack
PDF Full Text Request
Related items