Font Size: a A A

Research On Community Detection Technology Based On Network Representation Learning

Posted on:2019-04-11Degree:MasterType:Thesis
Country:ChinaCandidate:Y HuFull Text:PDF
GTID:2428330623950741Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
With the increasingly complexity of online social networks,nodes on network become rich nodes with multi-source information.Apart from network structure,other information on nodes themself such as attribute and generated-content is also of great value.The existing community discovery methods are mostly based on the network topology structure to detect the communities.Therefore,plenty of user information has not been fully utilized while the detected community structure cannot describe the organization mechanism of the real-world networks well,which it is certain that the classical community detection algorithms will face some serious challenge.In view of the above problems,this paper launches researches from two perspectives: one is how to fuse multi-source information to accurately describe the features of users,and the other is how to implement community detection based on user features.The main contributions are summarized as follows:(1).In order to characterize users more accurately in multi-source complex networks,this paper proposes a novel user representation model(User2vec)based on network representation learning method.Firstly,we model attribute information,user content and relationship network to generate three independent representation vectors.To be more specific,the attribute representation vector(info2vec)is generated by extracting attribute features.Different contents such as source,emoji and topic playing an important role in user-generated texts are separated and then their representation vector extracted by algorithoms concluding TF-IDF,LDA and Doc2 vec will be concatenated into content representation vector(blog2vec).The enhanced network structure extended by text features is embedded into the corresponding representation vector(graph2vec).Then,two user representation models(User2vec-m1 and User2vec-m2)that integrates multisource information are established to generate user representation vector user2 vec.Finally,a series of experiments are conducted to verify the performance differences between independent representation vector and fusion representation vector user2 vec.The experimental results show that user2 vec significantly improves the accuracy of two reasoning tasks and has many advantages in user portrait.(2).Based on the user representation vector user2 vec,this paper designs a threestage community detection algorithm(3SComs).Firstly,in order to address the K initial value selection problem in K-Means,modularity is taken as the optimization objective of the clustering algorithm K-Means,and then a heuristic K-Means algorithm is proposed to realize the non-overlapping community discovery in the first stage.Secondly,we use random undersampling to build training samples for each community label and utilize the Adaboost classifer to establish a multi-label prediction model,thus generating a membership matrix for each user and the label list for all users will be extracted.Then,considering that different weight on edge of the network indicates different propagation ability,we extend a multi-label propagation rule for weighted network and use an overlapped community discovery algorithm wMLPA to detect the community structures.Finally,the experimental results from different stages are used to verify the performance improvement of proposed algorithm.Meanwhile,the overlapping communities on partial data of Sina Weibo are generated through algorithm 3SComs.Therefore,the detected community structure is compared with the real-world group characteristics to verify the feasibility of the proposed algorithm 3SComs.In a word,this paper gives user representation in complex networks by means of network representation learning,and then detect overlapping communities,which provides a novel method for researches on social network topology.
Keywords/Search Tags:Multi-source Complex Network, User Representation Model, Community Detection, Heuristic K-Means
PDF Full Text Request
Related items