Font Size: a A A

Sensitive Community Discovery Based On Web Structure Mining Algorithms

Posted on:2008-10-21Degree:MasterType:Thesis
Country:ChinaCandidate:Q WangFull Text:PDF
GTID:2178360242472213Subject:Computer software and theory
Abstract/Summary:PDF Full Text Request
Recent development of WWW has been growing dramatically, enabling people to obtain more information of various fields than before. Nowadays, People are not merely content with simply getting information from the Web, they are looking forward to obtain information they are really interested in. Furthermore, people want to acquire information originality, social network behind the Web, especially the implicitly exposed social issue clues. WSC (Web Sensitive Community) is referred to a collection of Host, which focuses on similar topic and come together spontaneously. Discovering WSC could help people understand Web information more precisely, mastering social relationship behind the web, even acquire some unexpectable social behavior. Therefore, it has been a new research focus of web intelligence.In order to extract WSC, we have performed some research in the area of web information collecting, web structure mining and WSC extraction. The main contributions include:1. The setup of Three-Layer Web Structure Model. After the analysis of website's structure together with the vision character of web page's block, we propose a Three-Layer Web Structure Model constructed by Host Layer, Page Layer and Block Layer. Because considering Web through such a Three-Layer Structure viewpoint instead of the traditional flat Page Layer could take more issues into account, the mining result based on it could be improved significantly.2. The design and implementation of Web Collection System. In order to collect enough information special for Three-Layer Web Structure, we design an original system for the data collection of Host, Page and Block, as well as the relationships among them.3. The proposition of Three-Layer Based Web Structure Mining Algorithm. We adapt the traditional web structure mining algorithms (PageRank, HITS and SALSA) to fit theThree-Layer Web Structure Model, so as to improve the precision. The experiment in this thesis proves the effectivity of these algorithms.4. Web Sensitive Community Extraction. After obtaining authority web pages with Three-Layer Based Web Structure Mining Algorithms, we propose a series of approaches to seek Web Sensitive Community behind these authority pages. Furthermore, the statistic character of each community member is calculated. The physical location of each community member is also researched in this section.Finally, we have a conclusion of our work and discuss the future blueprint of web mining technology applied in social analysis.
Keywords/Search Tags:Information Retrieval, Social Network Analysis, Web Structure Mining, Web Sensitive Community, Three-Layer Based Web Structure Model
PDF Full Text Request
Related items