Font Size: a A A

Architectural Design And Implementation Of Downloadable Resource Oriented Web Search Engine System

Posted on:2006-08-19Degree:MasterType:Thesis
Country:ChinaCandidate:Y B LiuFull Text:PDF
GTID:2168360155962007Subject:Software engineering
Abstract/Summary:PDF Full Text Request
Using search engine, people can find the information which they need from Internet rapidly. General search engine, such as google, can satisfy the people who want to find information from the Internet greatly, but there are some flaws when using general search engine to search the downloadable-resource: for the most, the search engine doesn't return the links which point to the resources directly and some links returned by search engine are dead link which point to nothing. In order to sovle these problems, this thesis proposes a new search engine which is exclusively for searching downloadable-resource.By analyzing the characteristics of downloadable-resource, this thesis indicates the differences between web page and downloadable-resource: web page is the road sign to the downloadable-resource which can't make sure that users can get the downloadable-resource; the popular web site doesn't always supply the good downloadable-resource. It's very important that take the differences into account to improving the search engine performance in searching downloadable-resource.According to the characteristics of downloadable-resource, this thesis proposes a new search engine, named SureDown. The information collection strategy of SureDown is based on the characteristics of downloadable-resource. Crawlers try to visit the links which point to the downloadable-resource to test if the downloadable-resource could be downloaded, and then Crawlers download the pages related to the downloadable-resource. Indexer constructs the indexes for the pages collected by the Crawler. Some pretreatments will be done to the pages to create some resource description files, which is helpful to constructing the indexes more quickly. The sorter of SureDown calculates the rank score for every resource description file by compare the query words with the text included in the resource description files and sort the files according to the rank sore. User interface returns the goal links included in the resource description file to user by the same order as the resource description files.Finally, we construct a prototype to test SureDown. The result of the test shows us that SureDown is an advisable solution to improve the performance of the general search engine in searching downloadable-resource.
Keywords/Search Tags:Internet, search engine, crawler, downloadable-resource, invert index
PDF Full Text Request
Related items