Font Size: a A A

Research On Web Clickstream Data Analysis Based On Random Walk Model

Posted on:2017-03-13Degree:MasterType:Thesis
Country:ChinaCandidate:P T ShiFull Text:PDF
GTID:2348330536467487Subject:Management Science and Engineering
Abstract/Summary:PDF Full Text Request
With the development of Internet,large quantity of Web clickstream data is stored.Web clickstream data records users' activities online,getting wide attention from scholars in the world for its huge potential value.Previous studies on clickstream data mainly concentrate on analyzing the pattern of users' surfing behaviors,information recommendation,search engine optimization et.al.Little research investigates clickstream data from a network perspective to clarify the relationships of different information sources.Actually,the activities of users online not only depict what users care and what they like,the switch of users among different information can also reveal the relationships of different information sources.These information sources can be the entries on Wikipedia,the videos on YouTube,the blogs on Facebook et.al.For example,some researchers had utilized the clickstream data among different journal websites to expound how various research areas are connected and what's the relationship between social science and nature science.This paper tries to gives a deep insight on the users-flow interdependent relationship and the content similarities among different information by constructing the open flow network,utilizing the random walk model,analyzing the website clickstream network and Wikipedia entry clickstream network.This paper also gives a visualization method for clickstream network.The work of this paper mainly contains the two following fields:(1)The analysis of flow interdependent relationships in clickstream network.In the third chapter of this paper,we expand the closed network into open flow network,making the random walk model closer to the real surfing behaviors of users.Then we propose one method to calculate the total flow among nodes ijT,which considers both the direct flow and indirect flow.ijT can be used to describe the flow dependencies relationships,meaning the flow reduction of node j if node i is removed from the network.The analysis results on Web clickstream data show that we can find out the hidden “Users Provider” for websites and the invisible relationships among different Wikipedia entries.We also give another indicator iC to evaluate the importance of nodes based on the definition of ijT when considering which node controls more flow.iC gives a better rank order than PageRank algorithm in websites.(2)The analysis of content relevant relationships among different information sources revealed by the surfing behaviors of users and visualization methods.In the fourth chapter of this paper,we give the definition of flow distance to measure the content similarities among different information sources.The flow distance is measured as the average distance between two nodes,the closer the nodes,the more likely they contain relevant information.Then we propose a node embedding visualization method based on the flow distance,trying to embed the nodes into Euclidean space and using the Euclidean distance to depict the proximity degree of different information.Using this method,we can see the relationships among nodes,the distribution of users flow and which node plays a key role in users flow diffusion clearly.The analyzing results show that the nodes close to each other are usually relevant with each other.At the same time,we visualize the structure of websites clickstream and Wikipedia entries structure,and also investigate issues like the users flow distribution among different websites,the dynamic of websites structure et.al.In general,this paper provides the general model and analysis methods on flow interdependent relationships analysis,content similarities relationships analysis and visualization.The methods mentioned in this paper can also be expanded to other research fields,like the road traffic network,pathophoresis network et.al.
Keywords/Search Tags:Flow Network, Total Flow, Flow Interdependent Relationships, Flow Distance, Embedding Visualization Method
PDF Full Text Request
Related items