Font Size: a A A

Visual Analytics Based On Topic Models

Posted on:2020-11-18Degree:DoctorType:Dissertation
Country:ChinaCandidate:Y Y YanFull Text:PDF
GTID:1368330572996875Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
With the massive data being produced,collected and stored,how to assist users to understand and analyze data is a hot area of research.Visual analytics is a data analy-sis and reasoning method via an interactive visual interface.However,visualizing the massive data directly is hard.So,many data analysis methods are used in the visual analytics to extract latent patterns from data,and then we visualize the latent patterns.Topic models are a common latent semantic extraction method in the text data field.The extracted topic representation of documents can well describe the characteristics of the corresponding document.At the same time,each topic describes the latent patterns in the text collection,which can help users understand the text collection.Thus,topic models are widely employed in the visual analytics field.In topic-based visual analytics,topic models are usually adopted as a data processing method.Thus visual analytics is only used to visualize the results of topic models.Users cannot refine the results of topic models via interactions.Besides text data,topic models are widely used in visual analytics of trajectory data.Since topic models are based on the"bag-of-words" assumption,the direction information of trajectory data is ignored in topic models.Also,topic models cannot process OD(Origin-Destination)trajectory data.In view of the above shortcomings,this thesis studies visual analytics based on topic models.The main research results of this thesis are as follows:· Topic models usually cannot be combined with users' domain knowledge intu-itively and effectively during the topic modeling process.In order to solve this problem,this thesis proposes an interactive visual analytics system to help users refine generated topic models.Firstly,we modify the hierarchical Dirichlet pro-cess to support word constraints.Then we display the generated topic models via a matrix view to visually reveal the underlying relationship between words and top-ics,and use semantic-preserving word clouds to help users find word constraints effectively.Users can interactively refine the topic models by adding word con-straints.· Besides word constraints,supervised topic models are topic models with document labels.Document labels refine the topics as constraints,and the classification is based on topics.For supervised topic models,this thesis presents an interactive visual analytics systerm for incremental text classification based on a supervised topic modeling method,modified Gibbs sampling maximum entropy discrimina-tion latent Dirichlet allocation(Gibbs MedLDA).Given a text collection,Gibbs MedLDA generates topics as a summary of the text collection.This thesis designs a scatter plot to display documents and topics simultaneously to show the topic modeling results,and this helps users explore the text collection structurally and find labels for creating.After creating labels,Gibbs MedLDA is applied to the text collection with labels again,and it generates both the topic and the classification result.This thesis also provides a scatter plot with the classifier boundary and a matrix view to present weights of classifiers.Users can iteratively label documents to refine each classifier and get more discriminative,more suitable topics for text classification.· Topic models can be not only applied to text data but also traj ectory data via textual-ization.But topic models are usually based on the"bag-of-words" assumption,the direction information of traj ectory data is ignored in topic models.In order to solve this problem,this thesis employs the bigram topic model to analyze textualized tra-jectories to take the direction information of trajectories into account.This thesis further proposes a modified Apriori algorithm to extract frequent sub-trajectories and uses them to represent each topic as topical sub-trajectories.Finally,this the-sis designs a visual analytics system with several linked views to facilitate users to interactively explore topics,sub-trajectories,and trips.· The bike-sharing data is OD trajectory data.Trajectories cannot be converted to discrete sequences with which topic models are suitable to deal via textualization.Thus,this thesis constructs a tensor based on the spatial,temporal,and user infor-mation of bike-sharing data,and employs tensor factorization to extract latent user activity patterns.To facilitate users analyze and understand these patterns,a visual analytics system is designed to interactively explore these patterns from the spatial,temp oral,and user dimensions and compare these patterns in/between cities.
Keywords/Search Tags:Text visual analytics, trajectory visualization, visualization, topic models
PDF Full Text Request
Related items