Font Size: a A A

Crawler And Incremental Update Strategy Research In Deep Web

Posted on:2011-04-27Degree:MasterType:Thesis
Country:ChinaCandidate:M GuoFull Text:PDF
GTID:2178360305984869Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
The 21st century is the web-based, high-tech as the core of knowledge-based society. The internet increasingly important in our lives, more and more people search for information from the Internet. Users of search engines now becoming increasingly dependent, on the search results being "Specialized", "sophisticated" and "deep" higher and higher demands, the traditional web search engines have failed to meet the needs of our customers. Compared with ordinary web pages, Deep Web has lager amount of information, more specific themes, better structured information, and higher quality. Effectively searching resources in Deep Web provide users with more valuable information.Deep Web Search need overcome the limitations of traditional search technology, identify automatically search-able database from the internet, and submit search requests through the search interface and analyze the returned query results, then extract useful information and return to the user after data processing. With the purpose of more in-depth search and more professional information, we reviewed related pivotal studies in Deep Web, and presented vertical search system based cloud computing in Deep Web. To obtain structure information in mass and disorder internet resources, and professional, specific, in-depth retrieval services are provided. We used simple model, text feature vector, etc. to achieve classification between varieties of web pages. The experiments verified efficiency of web crawling and accuracy of pages classification.Additionally, we describe an incremental update crawler system in Deep Web. Unlike other studies, they only take into account the importance of the page or the update frequency, this paper according to different types of URL updates provided by different algorithms to achieve a reasonable incremental updates of data. Experimental data show that the incremental update algorithm is feasible, and, as a dynamic web page updates, the system parameters will be automatically updated according to the last saved situation each time, the update frequency, update scope will be achieved to automatic adjustment and thus improve the efficiency.
Keywords/Search Tags:deep web, vertical search, text classification, information extraction, incremental update
PDF Full Text Request
Related items