Font Size: a A A

Discovering Community Structures From Networks Based On Graph Mining

Posted on:2009-02-23Degree:DoctorType:Dissertation
Country:ChinaCandidate:F WeiFull Text:PDF
GTID:1100360272489278Subject:Computer software and theory
Abstract/Summary:PDF Full Text Request
Real-world networks naturally contain a lot of community structures which have been recognized as an important statistical feature of networked systems. Such as, it might be a set of club members in social networks; in biological networks, it possibly represents a group of genes assembled by functional interactions; in semantic web, it might be a collection of web pages related with a topic. Generally, the community is a subset of tightly-linked vertices. Its connections among community vertices are relatively denser than the ones crossing to the rest of the network. How to efficiently mine these structures is a crucial problem to know and analyse networks.Despite of current achievements in the discovering community structures from networks, there are still a lot of problems requiring to be studied. The performances of some algorithms are not efficient, and the measurements of community structure are not perfect. Quite less work has been done on the discovery of overlapping community structure, even though it could better capture the nature of network in some real-world applications. In this paper, we get the seed sets of community structures with the help of classical algorithms and find the overlapping or non-overlapping structures based on seed expansion. The contributions of this paper are highlighted as follows:1. With the multi-level method, we apply the classical spectral partition algorithm to produce seed sets. At the same time, we make an analysis about their advantages. The multi-level strategy brings a faster partitioning speed when computing the Fiedler vector of the Laplacian matrix. And, the spectral bisection algorithm helps to find the desired partitioning point of graph. The method helps us to find good seeds from whole network structure, which grasps the main bodies of target community structures.With real-world data, we validate the seeds have good performances. 2. We propose a novel algorithm which discovers the community structures from seed expansion. The algorithm is based on modularity function and trans-missive probabilities. The modularity Q defined by Newman and Girvan is a popular criterion to measure the community structures. The change of the Q value is used to evaluate the contribution of newly-expanded vertices to current seed group. The transmissive probabilities are used to infer relationships between neighborhood vertices and reflect the weights of getting to newly-expanded vertices. Their source comes from a set of seed vertices which have initial probabilities(initial weights). By sorting the probabilities of newly-expanded vertices at each time step, we get a descending order of vertices. According to this order, the contribution of newly-expanded vertices to seed sets are computed. The contributing values decide which vertices to be further expanded. Chapter 4 describes the algorithm and also gives an analysis upon the pruning of vertices and escaping probability of expansion process.3. For the overlapping community structures, we present a new method to detect them from networks. It opens a new avenue for solving the overlapping problem. The algorithm is still based on seed expansion. Combined with random walks technique, we give a reasonable expansion process which is scaled by time steps. At each time step, our algorithm firstly computes the degree-normalized probabilities of all vertices. According to the descending order in probabilities, all vertices are swept. Then, the vertices which will be further extended at next time step are decided. By the sweep, the algorithm also detects which newly-extended vertices are contributors to current seed set. The detection helps us find the best community structures of candidates at current step. With the contribution property, we give some theorems. Some useless vertices can be safely pruned based on these theorems. The expansion process repeats these steps until it satisfies the conditions that the overlapping rates among communities are beyond the tolerance of users or the mixing time of random walk is reached.In the fifth chapter, we also have a theoretic analysis of the whole expansion process. Our proposed method makes the community candidates have best structures at each time step and can bring good expanded structures to the initial seed sets.4. In six datasets, we evaluate these algorithms from different aspects. The datasets come from real-world networks which have various sizes and relate to several fields. There are a series of experiments, such as seed choosing, time tests and comparison to other methods. The experimental results show that our algorithms have good performances and give enough examples to prove that overlapping is important to find complete community structures, which let people be aware of the significance of overlapping community discovery in networks.In summary, this thesis introduces some novel algorithms to discover community structures from networks. The algorithms are seed expansion and their expansion processes are based on random walks. The expansion structures are measured by modularity function Q. We have some theoretic analyses and experiments to evaluation them. The results show that our methods can find the perfect community structures and have good performances.
Keywords/Search Tags:Network, Community Structure, Overlapping community structure, Seed Expansion, Modularity Function, Transmissive Probability, Random walk
PDF Full Text Request
Related items