Font Size: a A A

Brand Retrieval And Visualization In Large Scale Microblog Data

Posted on:2016-04-24Degree:MasterType:Thesis
Country:ChinaCandidate:Y GuanFull Text:PDF
GTID:2308330461478537Subject:Software engineering
Abstract/Summary:PDF Full Text Request
With the development of the Web 2.0 technique, there are some new features in network compared to the past network. Firstly, there are more kinds of information. Textual data dominated the Internet in past days, but there are more and more images, videos and audio data appearing in recent years. Secondly, the size of information in the Internet becomes larger. These new features bring challenge to researchers.Social network provides real data, and users can learn information from others by retrieving brand related microblogs with keywords. So microblog retrieval for brands is a useful application. There are many images among microblogs, so it is worth combining textual and visual information to retrieve microblogs. Also the rate of images in retrieved data should be higher for users to learn more information. In this paper I propose a microblog re-ranking method combined visual and textual features. This method uses a semi-supervised probabilistic graphical model to build single graphs, and single graphs are connected as a multiple graph. Different kinds of information can be computed in one framework while adjusting the weight for each kind feature automatically. Finally I use experiments to demonstrate the effectiveness of the proposed method.I also propose a graph based visualization method for large scale microblog datasets. I do not build connections between microblogs data by social information but content similarities. This method can avoid losing information by sampling a social dataset which will cost breaks in friendship or retweet chains. I use a force-directed graph layout method to make similar microblogs gathering up, then a 2D point cloud graph is generated. Firstly, I use preprocess, duplicate removal and sampling steps to get a small group of data; Secondly, a graph is built, and a 2D graph layout is generated by a graph layout algorithm; Finally, I show a visualization software which uses a heat map to present the layout result, also a real-time similar microblog retrieval method is used for users to observe data in details. Experiments are conducted on a Brand-Social-Net dataset which contains 3,000,000 microblogs. Experiments show that with the visualization results users can find some data patterns which are very useful to understand the dataset.
Keywords/Search Tags:Social Network, Brand, Microblog, Re-ranking, Visualization
PDF Full Text Request
Related items