Font Size: a A A

Block Based Web Community Identification

Posted on:2012-11-08Degree:MasterType:Thesis
Country:ChinaCandidate:L GaoFull Text:PDF
GTID:2218330368988759Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
With the rapid development of the Internet, Web has been the most important platform for modern people to share information and resources. Mining the characteristics of the Web is essential to obtain and understand the information on the Web. Because of the self-organization of the Web, there are many communities in the Web, and how to extract and make use of these communities has become one of the most important research topics in Web Data Mining.A Web community is defined as a set of Web pages created by people with the similar topic. Community is of great value in Web research, it reflects the social action of Web users, evaluation history and inter-relation of Web. It provides the most creditable resources on certain topic. Moreover, it provides an effective way to improve the efficiency of searching result.Currently, in community identification research, a Web page is usually considered to concern on only one interest, so one node in Web graph is correspond to one Web page. However, in fact, there are often multiple interests in one Web page, and it may generate different kind of links for different interests. If we don't do segmentation for this kind of Web pages, the community from link analysis result will contain many irrelevant pages. To solve this problem, we propose a block based Web community identification algorithm. In this algorithm, we begin with do segmentation for every Web page, and then, construct the Web graph with the blocks, finally, indentify the communities in the block based Web graph. The experimental results indicate that our approach has better performance than page based algorithm in avoiding topic drifting and improving the accuracy of the extracted community.
Keywords/Search Tags:Community Identification, Web Page Segmentation, Link Analysis, Max Flow Algorithm
PDF Full Text Request
Related items