Font Size: a A A

Web-based Topic Search, Applied Technology Research

Posted on:2008-08-09Degree:MasterType:Thesis
Country:ChinaCandidate:D S XieFull Text:PDF
GTID:2208360215967144Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
Coming into the new century, in Internet, the development of web space is increasingly rapid. With the comprehensive application of web inFormation, people's demand for domain reality relativity and leading of web inFormation is higher than before. Based on web, the development and evolution of specific search technology has become the key to available unlocking the Internet inFormation mine. Now, The services of web inFormation search are mainly afforded by sites of search engines that is mature in domestic and abroad.However, there are some limitations exist in the web crawling system that is extensively used by search engines, only using hyperlink searches public indexed web in space. This method is invalid for hidden web that occupies a majority of web space. The hidden web that has clear topic is usually produced by interaction of query Form between the user and web inFormation database. Aimed at this case, the paper focuses on the studying for inFormation crawl technology through hidden web. Based on the theory study for correlatively domestic and broad crawler, the principle of discovery and filtering is concluded which is accord with Chinese web and data resource interface Form. Based on the idea that is determined by the relativity of Form domain noumenon including domain correlation of elements, a new arithmetic is put forward which is about automatic identification domain-specific of Form. The paper designs and depicts a crawler system having web crawling capability from hidden web. This system comprises whole structure and partition of function module. The paper analyses and disposes the particular method and arithmetic to query Form and preference query words. In order to show this idea, based on education domain words database, the writer realized an experimental web crawling that faced on hidden web of education specific-domain. Through the practical crawling test of many sites in education domain, the system is proved valid.
Keywords/Search Tags:Hidden web, Crawler, Form, Queries
PDF Full Text Request
Related items