Font Size: a A A

Research On The Structure Mining Algorithms For Online Networks And Their Applications

Posted on:2020-03-19Degree:MasterType:Thesis
Country:ChinaCandidate:J WangFull Text:PDF
GTID:2428330596975723Subject:Computer software and theory
Abstract/Summary:PDF Full Text Request
With the rapid development of mobile Internet,online service platforms which represented by social networks,shopping,job hunting,and finance services have attracted large amount of users.Therefore many activities of the users are turning from offline to online.Meanwhile,multiple types of online activities of those users have generated massive online network data.On one side,These networked data have recorded basic information of the users(e.g.age,gender,religion,career,birthplace,etc.)and information of users and items(e.g.,text,images,tags etc.)On the other side,the social relations between users(e.g.friends and colleagues),and belonging relation between users and items(like purchasing and comment)have also been recorded.These relations can be described by networks with nodes correspond to users or items,like the social network between users and user-item bipartite networks.As the fast development of network science,network structure mining problems like link prediction and community detection have become very hot multidisciplinary research topic.By studying these problems,different types of link prediction and community detection algorithms have appeared and been applied to problems from diverse areas.Therefore,targeting on online user-user or user-item networks,this thesis will focus on studying network structure mining algorithms,like predicting accurately the latent links and effectively identifying community structures.Basing on these studies,the thesis will apply the algorithms to the problem of predicting social economical systems.The main contributions of the thesis are as follows:(1)This thesis proposed three link prediction algorithms.First,focusing on the problem that tag information are highly ignored in traditional link prediction algorithms,we propose a tag-based link prediction algorithm(TLP).Basing on the link prediction algorithms with only network structures,we introduce the information entropy to measure the homogeneity of the tag system formed by the tags of nodes and their neighbors.TLP algorithm incorporate the information of tags into the computing of similarities of nodes and define the node attractiveness in terms of the information entropy of tag systems.We compared TLP algorithm to 6 different algorithms on 4 different networks and find that TLP performs well.Secondly,consider the similarity between link prediction algorithm in bipartite networks and information recommendation problem,the thesis extent the concept of information entropy of tags to link prediction in user-item bipartite networks.We propose the personalized recommendation algorithm TPR which use information entropy to characterize the weights of item tags to users.By defining importance and degrees of favor of tags,we introduction them to the process of computing the scores of items to users.We use data of user-movie ratings and tags of movies from douban.com to test the TRP algorithm.Compared to the classic collaborative filtering algorithm,TRP algorithm has improved the accuracy by 10.9%.Lastly,considering the problem that heavy-tailed degree distributions will lead to highly unbalanced prediction list in traditional link prediction algorithms,the thesis propose a personalized link prediction algorithm basing on network diffusion(HLH).By comparing to 6 other link prediction algorithms on 4different online networks,HLH turns out to perform best.(2)This thesis proposed an algorithm which allows effective community detection and link prediction simultaneously.As online networks are possible to have lost of missing and noisy links,transnational community detection and link prediction algorithm cannot effectively perform their task directly.The thesis proposed an algorithm named Cluster-driven Low-rank Matrix Completion(CLMC).Different from traditional community detection algorithm that performs the task directly,CLMC regard the adjacency matrix as a superposition of community matrix,noise matrix and completion matrix,then add some constraints to decompose the adjacency matrix into the three matrices,and use the community matrix to detect the community.CLMC algorithm do not work directly on the original network,but starting from the perspective of of matrix completion to learn an ideal block-diagonal matrix by removing noisy edges and supplementing missing edges with cluster-structure and low-rank constraints.Meanwhile during the procedure of learning a low rank block matrix,the filling matrix of CLMC algorithm can effectively represent the latent linking relations between nodes and be applied to achieve link prediction.For performing test of the CLMC algorithm,besides 4 online networks,we also consider 2 other real world networks and some benchmark networks to keep diverse of the tested networks.Compared to 8 traditional community detection algorithm,CLCM can effectively identify the community structures on networks with noisy and missing links by using the community matrix.Compared to 16 traditional link prediction algorithms,CLMC can also effectively predict the latent links by using the matrix completion.(3)This thesis applied link prediction and community detection algorithms to predict economic status and analyze economic structure.First,regarding the problem that traditional methods for economic situational awareness consume many resources and follow a long-time delay,we take the advantage that users in online networks have social attributes and infer economic status from online network structures.Specifically,based on the resume data from online recruitment platforms and the following relations among users on Weibo,we built the talent flow network and the information flow network among regions,respectively.After analyzing the correlations between network structural features and regional economic development,we found that the talent flow network exhibits a stronger predictive power for economic development than the information flow network.In particular,the composite index of the structures of both networks can explain up to about 83.8% of the variance in GDP.Then,regarding the problem that there may be missing links and potential links in the talent flow network,we apply link prediction algorithms to predict the links in the talent flow network and improve its forecasting ability towards regional economic development.Specifically,we applied the HLH and the CLMC link prediction algorithms to the talent flow network and then built the talent flow trend network.Results showed that the explanatory power of the composite index of the talent flow trend network structural features to GDP can be significantly increased.Moreover,the talent flow trend network based on the CLMC algorithm exhibits a stronger predictive power than the one based on the HLH algorithm.Finally,we apply community detection algorithms to analyze the information flow network,and reveal the regional economic structural characteristics and the potential economic structural risks.Specifically,we predicted the status of economic development and the industrial structure using online registration data of Weibo users.Moreover,we applied the community detection CLMC algorithm to analyze the information flow network,and then predicted the type of regional industrial structure using the detected community structure.
Keywords/Search Tags:Online Network, Network Structure Mining, Link Prediction, Community Mining, Economic Predictions
PDF Full Text Request
Related items