Font Size: a A A

Research On Community Structure Of Complex Network Based Website Clustering

Posted on:2010-08-13Degree:MasterType:Thesis
Country:ChinaCandidate:L GuanFull Text:PDF
GTID:2178360332957860Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
With the rapid development of the Internet, exponential growth of information is a great challenge to acquire useful information fast and efficiently. Website is a higher level abstraction of information structure than the Web page, the website clustering as an important branch of web mining has attracted more attention. The discovery of relationships among websites can be applied to mine online communities, compute website similarity, conduct popularity analysis, help website navigation etc. Nowadays, a lot of systems are modelled as a complex network system, such as: the network of publication quotation, World Wide Web, biological networks, etc. These complex systems are emerged as agglomerate in nature. In this paper we will also conduct website clustering based on the community structure of complex network.In link-based website clustering method, it is difficult to collect website hierarchical structure, so it becomes the bottleneck in application. In addition, because of high time complexity when tackling with text content, content-based clustering approach is also unsuitable for large-scale data processing.The main task of this thesis is to utilize community detection techniques of complex network in mining the Internet website relationships and groups, community-detection technique simulates the Internet into a huge map, in which each site is a node in this graph, the edge between two nodes represents the relationship between these two sites. In this paper, the number of hyperlinks characterize the weights and improve the weight measure. For its clustering effect, as outlink and inlink affected the relationship between any two websites differently, if two websites point each other, this represents stronger ties, we add this feature in order to improve clustering effect.Based on in-depth study of web communities structures, to visualize the clustering results, designing and implementing a website clustering and navigation system, presents the websites'relationships to users. In addition, based on the website clustering results, to analyze the website's physical addresses through converting the website domain to IP address, and then analyze the website relationships in different areas and the significance of the website in any countries or cities.The experimental results and system show that utilization of the community detection technique of complex network to conduct website clustering is feasible and the system is useful for users.
Keywords/Search Tags:website clustering, complex network, community structure, web mining
PDF Full Text Request
Related items