Research On Label-Based Method Of Community Detection In Heterogeneous Information Network

Posted on:2018-08-13

Degree:Master

Type:Thesis

Country:China

Candidate:C Lv

Full Text:PDF

GTID:2348330542990824

Subject:Engineering

Abstract/Summary:

PDF Full Text Request

As one of the key research points of data mining,community detection has been greatly developed throughout years of study.Yet the main area of community detection has maintained its focus on homogeneous information network.With the gradually refining of Web 2.0 technology and proposal of Web 3.0 concept,the amount of information is witnessed to expand explosively.The data it carries is also growing rapidly and accumulating in a very large scale.Traditional technologies and research methods in homogeneous information network cannot extend their efficiency and accuracy to heterogeneous information network.For the purposes of mining valuable information in various network,researches and studies focused on heterogeneous information network-based community detection occupy the leading position of this field.However,the theoretical concepts and related technologies still needs to be improved due to the complexity and diversity of heterogeneous information network.It is of vital importance and meaning to detect community structure in heterogeneous information network with accuracy.Label Propagation Algorithm is one of the classic methods to detect communities.This paper proposes a mixture-similarity-based label propagation algorithm named Sem-COPRA with many improvements on traditional algorithm using semantic information in heterogeneous information network.Sem-COPRA firstly adopts LDA model to generate k-dimensional semantic vectors for those nodes which are in possession of semantic information.Then,with the proposal of a novel Semantics Sharing Method,the semantic vectors are extended to the whole network with different variety of nodes which do not contains any semantic information.A mixture similarity measurement method is proposed afterwards,using both topological and semantic similarity.Thus,a weighted network model is constructed when applying this mixture similarity measurement to original network.In this weighted network,where semantic information is strengthened,classic COPRA algorithm is improved with three aspects: the re-definition of label’s coefficient with mixture similarity to make semantic information is considered in the algorithm,the application of introducing semantic importance of nodes to rank each node in order to reduce the unstable problem caused by randomness,and what’s more,optimization of label selection process using a threshold of semantic importance.Experiments are carried out on DBLP,Weibo and other datasets,where Sem-COPRA and other classic community detection algorithms are involved to test the community detection result.Experiments show that stable community structure can be detected using Sem-COPRA and a better accuracy is reached when comparing Sem-COPRA with other algorithms in heterogeneous information network.

Keywords/Search Tags:

heterogeneous information network, community detection, label propagation, mixture similarity

PDF Full Text Request

Related items

1	Research Of Community Detection Algorithms Based On Label Propagation
2	Label Propagation Based On Community Core For Community Detection Algorithm
3	Research On Community Discovery Algorithm Based On Node Similarity
4	The Research Of Real-time Community Detection Algorithm Based On Label Propagation
5	Research On Community Detection Algorithms In Social Network
6	Algorithm For Overlapping Community Detection Based On Label Propagation
7	Study Of Community Detection Algorithms Based On Node Similarity
8	Improvement Of Overlapping Community Detection Based On Label Propagation Algorithm
9	Research On Community Detection Method Of Social Network Based On Label Influence
10	Semi-supervised Community Detection Algorithm Based On Local Label Information