| As one of the key research points of data mining,community detection has been greatly developed throughout years of study.Yet the main area of community detection has maintained its focus on homogeneous information network.With the gradually refining of Web 2.0 technology and proposal of Web 3.0 concept,the amount of information is witnessed to expand explosively.The data it carries is also growing rapidly and accumulating in a very large scale.Traditional technologies and research methods in homogeneous information network cannot extend their efficiency and accuracy to heterogeneous information network.For the purposes of mining valuable information in various network,researches and studies focused on heterogeneous information network-based community detection occupy the leading position of this field.However,the theoretical concepts and related technologies still needs to be improved due to the complexity and diversity of heterogeneous information network.It is of vital importance and meaning to detect community structure in heterogeneous information network with accuracy.Label Propagation Algorithm is one of the classic methods to detect communities.This paper proposes a mixture-similarity-based label propagation algorithm named Sem-COPRA with many improvements on traditional algorithm using semantic information in heterogeneous information network.Sem-COPRA firstly adopts LDA model to generate k-dimensional semantic vectors for those nodes which are in possession of semantic information.Then,with the proposal of a novel Semantics Sharing Method,the semantic vectors are extended to the whole network with different variety of nodes which do not contains any semantic information.A mixture similarity measurement method is proposed afterwards,using both topological and semantic similarity.Thus,a weighted network model is constructed when applying this mixture similarity measurement to original network.In this weighted network,where semantic information is strengthened,classic COPRA algorithm is improved with three aspects: the re-definition of label’s coefficient with mixture similarity to make semantic information is considered in the algorithm,the application of introducing semantic importance of nodes to rank each node in order to reduce the unstable problem caused by randomness,and what’s more,optimization of label selection process using a threshold of semantic importance.Experiments are carried out on DBLP,Weibo and other datasets,where Sem-COPRA and other classic community detection algorithms are involved to test the community detection result.Experiments show that stable community structure can be detected using Sem-COPRA and a better accuracy is reached when comparing Sem-COPRA with other algorithms in heterogeneous information network. |