Design And Implementation Of Microblog Information Visualization System

Posted on:2020-06-17

Degree:Master

Type:Thesis

Country:China

Candidate:J H Li

Full Text:PDF

GTID:2428330578952716

Subject:Software engineering

Abstract/Summary:

PDF Full Text Request

With the development of the mobile Internet,users of microblog social platforms are exponentially increasing.According to Sina Weibo's second-quarter earnings statistics last year,its monthly active users have reached 431 million.Faced with the massive microblog text set,how to visualize and analyze the search results accurately and efficiently is an urgently needed function currently,according to the query demand of keywords,topics,etc.Currently,microblog search engines can only return the results in the form of a document list.It is the biggest challenge of microblog search engines that how to visually process all search results according to query demand.For this challenge,we designed a microblog information visualization system,the main research work includes the following two parts:First,topic trend and region analysis:firstly,we use the WebCollector crawler to download the original post of microblog,and preprocess the posts by the rules.Then we use the Chinese IK tokenizer plugin with the custom extension dictionary to process the word segmentation,disambiguation,and stop words for each short text in the corpus.Meanwhile,we can generate the corresponding inverted record table and dictionary.Then,ELK technology stack is used to build a distributed index library and search engine platform,and we store all data in the inverted record table and dictionary in the platform.Finally,we get the result set returned by the retrieval system,and the visualization component Kibana is used to do data statistical analysis and'visualization processing for the query result set.Through this function,which can help users quickly get the user's geographical distribution,topic development trend and degree of concern from the result set.Second,topic clustering:firstly,we use the retrieval function to query all text result sets related to the target topic;then,we use the Lucene framework to generate a corresponding inverted index based on the returned text result set,and the process mainly includes word segmentation,ambiguity processing and removal of stop words;Then we use the SVD clustering algorithm in Lingo to generate several subtopic result sets by using a top-down approach;finally,we use the Carrot2 visualization component to visualize the subtopic result set according to the number of subtopics,clustering dispersion,and text score threshold.Through this function,users can adjust clustering parameters to help them further clarify the query demand and obtain query results more efficiently.

Keywords/Search Tags:

visualization, topic clustering, microblog text, ELK technology stack

PDF Full Text Request

Related items

1	Design And Implementation Of Microblog Hot Topic Found System
2	Research On Hotspot Detection Technology Of Microblogging Public Opinion Based On Text Clustering
3	Research On Hot Topic Detection Methods For Microblog
4	Research On Hot Topic Discovery Of Sina Microblog
5	Research On Theanalysis Of Microblog Information Based On Text Clustering
6	Research On Microblog Topic Extraction Method Based On Text Semantic Information
7	Topic Visualization Based On Multi-Source Text Corpora
8	Research On LDA Short Text Clustering Algorithm For Microblog Comments
9	Research On Microblog Text Processing And Topic Analysis Methods
10	Research And Implementation Of Distributed Topic Clustering Technology For Text Flow