Font Size: a A A

Web Community Structure Mining Research And Applications

Posted on:2009-01-12Degree:MasterType:Thesis
Country:ChinaCandidate:C J LuoFull Text:PDF
GTID:2208360245468767Subject:Computer software and theory
Abstract/Summary:PDF Full Text Request
Web is the huge information source constituted by a complex hypertext composes, and unceasingly expanded in a fast speed. Many communities are existed in the Web during the development process. These communities become the very important information of the Web organization. The valuable, reliable and prompt information can be provided to the customers by the community. The community reflected the universal existence, complex to gather the group relations and hierarchical relation on Web. How to use and discover the community in Web is a research direction of Web mining.This paper has analyzed the definition and development of Web community, concept and classification of the Web data mining, and the link parsing technique. It has carried on the detailed analysis and the comparison for the classical Web community structure mining algorithm, based on PageRank algorithm of importance analyses, Trawling algorithm of bipartite graphs, and HITS algorithm of subject extraction. The traditional maximum flow algorithm and the maximum flow algorithm based on the HITS algorithm's side capacity assignment have been mainly and deeply researched, and pointed out problems existed during the community mining. Although the subject drifting problem can be well solved by the traditional maximum-flow algorithm, but it will also bring a lot of disadvantage influences to the community's quality and quantity. The maximum flow algorithm based on HITS algorithm's side capacity assignment using hub values and authoriy value simply added as side capacity, which has a possibility that the noise page might would be extract. In order to solve the algorithm existing problem mentioned above, an improved maximum-flow algorithm was pointed out based on the transmission probability's side capacity assignment in this paper. This algorithm makes two different angle's attribute property quantification fuses the joint connection continually of node linking and node relevance into the transmission probability. The side capacity distributed based on the transmission probability. The computation of transmission probability consideration node and node many kinds of factors, the algorithm has been optimized on original algorithm.At last, a Web community structure mining system has been designed in this paper, which discoveried Web communities with the improved algorithm. It is proved with numbers of experiment that the system can well solve some traditional algorithm problems existed during the community mining. The accuracy of the Web community mining is more improved.
Keywords/Search Tags:Web community, Web Data Mining, HITS, Maximal-flow, Transmission probability
PDF Full Text Request
Related items