Font Size: a A A

Research And Improvement For Web Community Identification Technique

Posted on:2007-06-21Degree:MasterType:Thesis
Country:ChinaCandidate:H Y YaoFull Text:PDF
GTID:2178360212957266Subject:Software engineering
Abstract/Summary:PDF Full Text Request
Web is a complicated collection of hypertext and expands with tremendous speed. Finding and applying usable information of Web is a challenging job. There exists a lot of communities while Web grows. These communities are very important information in Web organization. Communities can provide valuable and credible information in time for users, web community represents social activity in the web, and deeply researching on the web community can make one knows the knowledge information in the web and it's organization structure status.This thesis classifies the porpular community identification techniques presently, and implements the methods based on complete bipartite graphs and maximum flow algorithm. By using same topics, compares the web communities obtained by those methods and analyzed those characteristics. Especially in the web community identification technique based on maximum flow, this thesis studies the relation between edge capacity and community scale.The previously proposed approach has a problem that a certain graph structure containing noises is always extracted, this problem is mainly caused by edge capacities assigned a constant value which regard each edge the same important, but in fact, each hyper-link including information has different value. For the sake of improving the quality of community, this thesis analyzes link structure characteristics, basing on the characteristics of probability distribution, and proposes a method of assigning edge capacity by utilizing powerlaw distribution of web page's in-degree and out-degree. This method not only considers the differences of edges' importance which link from the same node, but also considers that from different nodes. In addition, this thesis summarizes some problem in experiments and proposes some solutions. The experiment shows that this method mends traditional maximum flow algorithm, and improves the quality of web communities.This thesis presents the improved edge capacity assigning method for web communities identification based on the characteristics of probability distribution, and provides the new ways and thoughts which guarantee that the underlying communities can be identificated. So it is valuable in theoretical and practical fields.
Keywords/Search Tags:Web community, Web graph, Community identification, Maximum flow algorithm
PDF Full Text Request
Related items