Font Size: a A A

Research And Application On Web Community Mining Technology Based On Net Flow

Posted on:2008-11-09Degree:MasterType:Thesis
Country:ChinaCandidate:Z G FengFull Text:PDF
GTID:2178360212495291Subject:Computer software and theory
Abstract/Summary:PDF Full Text Request
How to discover the"topic"community, making users quickly find the knowledge from Internet, is a research direction of web mining. With deep research of the Web community mining method, a new Web community mining method was developed and this method was applied to the graph classification in this paper. Applications of this approach include focused crawler and search engines, automatic population of portal categories, and improved filtering.Firstly, contrasted with the definitions of the Web community, a new stricter Web community definition was introduced, with both inside vertex and outside vertex satisfying a stricter condition, in order to overcome the ambiguous boundary problem.Secondly, a Web community mining algorithm was proposed, using the concept of net flow in graph theory, setting Web as graph, and then utilizing the property of in-degree and out-degree assign capacity for the edge, and constructing net flow model, and using the principle of maximum flow/minimum cut to obtain the community.Thirdly, because the Web community mining technology can overcome the ambiguous boundary problem and make the community mined unique, the technology can be applied into the partitioning of the graph. Web page partitioning was proposed by the equivalence relation defined using the class of web community, where two pages are equivalent if and only if the sets of Web communities including each page coincide, and hierarchical partitioning was also proposed by repeatedly applying this partitioning to the contracted graph in which all original vertices in the same partition were contracted into one vertex.Finally, an easy Web community seach engine system was constructed byusing Web crawler and open source Lucene, which has the function of rank according to the co-relation and group by search result.
Keywords/Search Tags:Web community, Maximum flow/Minimum cut, Community discovering, HITS, Gomory-Hu Tree, Graph partitioning
PDF Full Text Request
Related items