Font Size: a A A

Research And Realization On Web Community Mining Algorithm

Posted on:2010-10-13Degree:MasterType:Thesis
Country:ChinaCandidate:J LiuFull Text:PDF
GTID:2178360302959041Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
Web is a huge information resource library, whose content is complex, and form is various. According to the inquiry subject, discovering Web Community that gathers together on the Web, so as to make users withdraw quickly the knowledge they need from the Internet, is the Web community mining. Web community mining technology not only make it possible that strengthening the existing searching and browsing technique by using the link information effectively, but also has important significance on search engines, automatic classification of content portals as well as Internet content filtering.Embarking from the different Web community definition, on the foundation of deep research on Web community mining technology, a new Web community mining method was proposed.Firstly, research on the seed-page discovery process of MaxFlow algorithm. It was affected by the subjective mind of users, but HITS algorithm could discover authority pages, so a new algorithm–PHITS algorithm was proposed. The PHITS algorithm first improved the structure process of the neighborhood graph, and then calculated the authority value and hub value of every page using the new formula, in the end, the pages with high authority were extracted.Secondly, mine Web community by using the MaxFlow algorithm whose seed-pages were the pages that were discovered by PHITS algorithm and the relatively strict Web community definition which provide constraints to the pages in the community and the pages out of the community at the same time. The entire process was called the PH-MaxFlow algorithm.Thirdly, as to the Web community which was already discovered, the traditional evaluation method was the subjective evaluation made by the user's inquiry subject and results, so there were some subjective factors in the process. In order to overcome the user's adverse influence, we proposed a definite evaluation formula to evaluate the correlation degree between the results and the inquiry subject; this formula combined the formation of Web community and the graph partitioning.Finally, a simple Web community search system was constructed. We validated the above algorithms and gived the results.
Keywords/Search Tags:Web community, Maximum flow, Community mininging, HITS, Seed-page
PDF Full Text Request
Related items