Font Size: a A A

The Technology Of Topical Web Mining Based On Machine Learning

Posted on:2008-06-09Degree:MasterType:Thesis
Country:ChinaCandidate:N ZhangFull Text:PDF
GTID:2178360215972137Subject:Computer software and theory
Abstract/Summary:PDF Full Text Request
With the web information resources emerge abruptly, how to get those potential and valuable information from network has attracted people' s more and more attention. Confronted with this huge, heterogeneous and semi-structural information repository, Web surfers often have to spend a lot of time and efforts to find information needed, and even that they may fail in many cases. Topical Web Mining is a new research direction in recent years, which provides a new research direction.The main contributions of the thesis can be summarized as follows:1. This article studies and analyses Web Mining and Machine Learning. Web Mining is divided into three branches by different objects: Web Content Mining Web Structure Mining Web Usage Mining. According to the distribution of the topic Web pages on the web, Topical Web Mining collects Web pages which are related to the topic theme and analyses, handles them by intellectual ways. Machine Learning is an important branch of the field of artificial intelligence. This paper presents the model of Machine Learning, classification, and the development process. Meanwhile, Machine Learning in the field of Web Mining application is described.2. Web crawler is a kind of recursive traversal web automation program which can download web pages and analysis these content. How to control the crawler' s crawling strategy effectively is one of the most important factors which influence Web Mining . In the light of Machine Learning, and using the negative example study theory, we advance a new crawler' s crawling strategy. The experimental results show that this strategy can increase the harvest rate of inquiries.3. The calculation of Web page' s authority radio is an important issue for Web Mining. Based on the HITS algorithm, we advance a new algorithm for calculating the importance -the WHITS algorithm.4. We have designed and implemented a Topical Web Mining System based onMachine Learning. This system can collect pages based on user' s requests,and calculate importance of those Web pages, and feedback those themepages to the user finally. Meanwhile, we can adjust system according tothe user' s feedback information.
Keywords/Search Tags:Web Mining, Machine Learning, Topical Crawler, HITS
PDF Full Text Request
Related items