Font Size: a A A

Research On Intelligent Algorithms For Network Community Mining

Posted on:2011-02-01Degree:MasterType:Thesis
Country:ChinaCandidate:D X HeFull Text:PDF
GTID:2120360305455320Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
In the real world, manycomplex systems take the form of networks. Examples includemetabolic networks, the Internet, scientific collaboration networks and so on. Networks, ingeneral, are constituted by a set of nodes or vertices and bya set of links or edges, where anode is an individual member in the complex system and an edge is a link between nodesaccording to a relation in the system. The study of complex networks attracts manyresearchers from a widevarietyof research fields, and has become a major interdisciplinaryresearchareainrecentyears.Recently, the characterization of community (or module) structures that most complexnetworks seem to share attracts more and more attentions from researchers of differentresearch area. Acommunity in a network is usually thought of as a group of nodes that aresimilartoeachother anddissimilar from therest of thenetwork. It's believed that nodes aredensely connected within communities while being loosely connected to the rest of thenetworks. The capability of detecting the partitioning of a network in communities couldclearly have important theoretical significance and practical value to analyze the topologystructures of complex networks, understand the functions of complex networks, and predictthe behaviors of complex networks. It has been widely applied in many areas, such asterrorist organization identification, protein function prediction, link prediction, Webcommunitymining,searchengineetc.As the identification of communities has important theoretical significance andpractical value, the problem of communitymining has been receiving a lot of attention andmany different algorithms have been proposed to detect the community structures.Accordingtothebasicsolvingstrategyadoptedbyalgorithms,mostofthesealgorithmscanbe divided into two categories: heuristic based methods and optimization. The formersolves the problem of communitymining based on some intuitive assumptions or heuristicrules——the famous GN algorithm for example. The latter solves the problem ofcommunity mining by transforming it into an optimization problem and trying to find anoptimal solution for a predefined objective function such as the network modularityemployed in several algorithms. Unfortunately, maximizing the network modularity Q hasbeen proven to be a nondeterministic polynomial time (NP)-complete problem, whichmakes it unable to carry out an exhaustive search of all possible divisions for the optimalvalue of Q in a large network. Genetic algorithm (GA) is an efficacious method for solvingNP-complete problems, and able to dramatically reduce the time complexityof solving theproblems while ensuring the quality of the solutions. The existing algorithms for community mining based on GAhave drawbacks in global search. The major cause whichleads to these drawbacks is that commonlyused crossover operators can not be suitable forcommunity mining in complex networks. The existing genetic based algorithms forcommunity mining remove the crossover operator from genetic algorithm or can't provideaneffectivecrossoveroperator.Against the deficiency of the existing community mining algorithm based on GA, wepropose a Genetic Algorithm with Ensemble Learning (GAEL) for community mining incomplexnetworksinthispaper.GAEL adopts the string encoding strategy. A potential solution of the problem (achromosome) is coded as an integer string of constant length and the value of an integer inthestringrepresentstheidentifierofthecommunityinwhichthenodeisclassified.GAEL employs the network modularity or Q function as the objective function andtakesitasfitnessfunctiontoevaluatethepotentialsolutions.It is noted that the function of the ensemble learning method for combining multipleclustering solutions into a better one is somewhat similar to that of a crossover operator ofGAthat works to mix clustering information of different individuals into a new better one.In this paper, an ensemble learning based multi-individual crossover operator is proposedfor GAEL. The crossover operator combines the clustering information of multiple parentindividuals to generate offspring, assisted by the local information of network topology.The crossover operation proposed in this paper falls in the general category ofagglomerativehierarchicalalgorithm.Todeterminetheorderinwhichedgesareadded,twoconcepts"edgejoinstrength"and"edgestructuralsimilarity"wereintroducedinthispaper.Edge join strength is a measure of the probability that an edge lies within communityutilizing clustering information of parent individuals by ensemble learning method; Edgestructural similarity is another measure of the probability that an edge lies withincommunity utilizing the local information of network topology according to theacquaintance model in sociology. As the concept of edge join strength utilizes rationallyprior knowledge, it is a more accurate measurement. Therefore, edge join strength is usedas primary measure according to which edges are ordered, while edge structural similarityis used as auxiliary measure which is used only when edges have identical value of edgejoinstrength.GAELusesanensemblelearningbasedmulti-individualcrossoveroperatortoproduce new individuals. Therefore, GAEL can avoid the problem that's brought bytraditional crossover operators which ignore the clustering contents and only exchangestring blocks of different individuals. Furthermore, our ensemble learning basedmulti-individual crossover operator can transmit the excellent characteristics from onegeneration to the next,thus the global search ability of crossoveroperator can be exhibited.In order to make full use of ensemble learning, an Individual Generation Algorithmbased on Markov Random Walk (IGAMRW) is proposed in this paper. From the viewpointof Markov random stochastic theory, if network has the property of community structure, agents that start from nodes within the community of the destination node should havemore paths to choose from in order to reach the destination node within some l steps, if l isset properly. On the contrary, agents that start from nodes outside the community of thedestination node have a much lower probability of eventually arriving at the destination.Based on the above theory, under the action of the network modularity Q, IGAMRWgenerates initial individuals by the following recursive procedure: first, select randomly thedestination node t from the network (or sub-network), and calculate the probability thatagent starting from current node can eventually arrive at the destination node t within lsteps for each node. Then, rank all the nodes according to their associated probabilityvalues in descending order, and find a cutoff point which the cut that results in the greatestincrease in global modularity is based on. If there is no cutoff point which can results in anincrease in global modularity, the recursive procedure is terminated; otherwise, divide thenetwork (or sub-network) into two parts according to the found cutoff point, and iterate theabove procedure for all the sub-networks. In each recursive call procedure, select randomlythe destination node t from the network (or sub-network), and so different solutions(individuals) can be obtained by running multiple times the algorithm of IGAMRW on thesame network. Therefore, IGAMRW can provide diverse individuals. In addition,individuals generated by the combined action of Markov random walk theory andmodularity Q exhibit some accuracy. Thus it can be seen that IGAMRW can producediverse and accurate individuals, and these individuals are very suitable for ensemblelearning. The population initializing method cooperates with the ensemble learning basedcrossover operator, thus the search capability of GAEL is effectively strengthened.A local search strategy is used in the mutation operator, which makes the mutated nodeplaced into the community which most of its neighbors belong to. When the node has anequal maximum number of neighbors in two or more communities, we break ties randomlyamong the possible candidates. Local search strategy is able to find the problem of a smallnumber of nodes misplaced which is difficult to be found by global fitness evaluation, thusit can fulfill the task of local search and leads the evolution process to a better direction.In order to retain the fittest individual in each generation and improve the convergencespeed of genetic algorithm, ? ?? selection that genetic algorithms for combinatorialoptimization have a partiality for is used in this paper.GAEL is terminated on condition that at least one of the following two requirements ismet: 1) the fittest individual in the population has not been improved for a predefinednumber of generations; 2) the number of the current generation is beyond limitation.The proposed GAEL is tested on both computer-generated and five real-worldnetworks with wide range of application, and is compared with current representativealgorithms in community mining. Experimental results show the feasibility and validity ofGAEL.Our future work can be laid as follows. First, we intend to improve performance of GAEL by considering the two aspects——speed and precision. Second, we intend toapply it to the analyze real-world large-scale networks, and try to uncover and interpretsignificative community structure that is expected to be found on them.
Keywords/Search Tags:Complex Network, Community Structure, Genetic Algorithm, Ensemble Learning, LocalSearch
PDF Full Text Request
Related items