Font Size: a A A

Research Of Web Community Discovery Based On Link Similarity

Posted on:2009-09-09Degree:MasterType:Thesis
Country:ChinaCandidate:D P GuiFull Text:PDF
GTID:2178360272970504Subject:Software engineering
Abstract/Summary:PDF Full Text Request
Many communities occur during the developing of Web,those communities are important information in Web ,can provide prompt and reliable information to user.the ubiquitous and complex cluster and hierachical relationship are reflected by them.The cost of discovering and maintaining communities by human is very high and the job is difficult,and many potential communities can't found out by this way.So there are many research in discovering communities atuomatically or half automatically.And linkage analysis is an important approach to discover potential Web communities.The character of Web linkage structure is analyzed in this paper,traditional max flow algorithm and max flow altorithm based on HITS's edge capacity assignment are studied.Especially the problems exist in Web mining of them.The algorithm based on HITS set the edge capacity with two points' average value of hub and authority,is a solution to traditional algorithm's problem,there is deficiency still.On the basis of analysis of the definition of original linkage similarity ,the new definition of linkage similariy and topic difference is pointed out in this paper,it can describe the relationship between pages linked.Then the similarity between pages are counted based on them.Finally a more reasonable and effective way of edge capacity assignment occurs.The similarity between pages are in direct ratio to linkage similarity,in inverse ratio to topic difference;the more the similar between pages,the heavier the edge capacity.The 24 communities discovered with their own topics show that the max flow algorithm with edge capacity based on this model can fix out the problem in original algorithm well,the quality of communities is improved highly.This paper provides new method and idea for linkage similarity between pages,and provides new tragedy for Web community identification based on linkage analysis.So the research of this paper is valuable in theory and practice.
Keywords/Search Tags:Web Mining, Web Communities Discovery, Linkage Similarity, Topic Difference, Max Flow Algorithm
PDF Full Text Request
Related items