Font Size: a A A

Sensitive Web Pages Found Technical Studies

Posted on:2003-01-18Degree:MasterType:Thesis
Country:ChinaCandidate:H X HuFull Text:PDF
GTID:2208360065962275Subject:Signal and Information Processing
Abstract/Summary:PDF Full Text Request
With the development of network and the popularization of Internet, Web has become a huge distributed information space. It provides a valuable information source for people and a new information-obtaining approach for the army. But the Web information retrieval tools are not developed as quickly as Web. Information overload is a great obstacle that prevents people from discovering information on the Web effectively.Web interesting-page discovering system is part of the research of Web-based intelligent interesting-information discovering technology, which is a project of our college. The goal of the system is to accurately and timely discover Web pages which contain interesting information. It can automatically discover related pages and send them to the user according to the demands of the user. Then the user gives a score to these pages. Through the feedback, the system can get information about user's interests to realize information-personality. This paper introduces the basic theory, implementation and performance of Web interesting-page discovering system. The following are the research emphasis of this paper:1. HTML documents are semi-structure data. Considering this characteristic, the termweight selecting algorithm--TFIDF algorithm of VSM is mended to process HTMLdocuments more effectively;2. The concept of ratio of text and hyperlink is introduced to identify content pages and catalog pages. Its goal is to ensure that the discovered results are all content pages. This eases the browsing burden of the user greatly;3. The discovered results are simply clustered to ensure that the results are different;4. The system adopts machine-learning based on feedback to learn the user's interests. Different feedback has different processing method. This overcomes the problem of the user's interest excursion effectively.The test data indicate that the sub-system is stable and the performance is good.
Keywords/Search Tags:Web, Information Retrieval, Agent, VSM, Machine Learning
PDF Full Text Request
Related items