The Design And Implementation Of Intelligent Information Retrieval System

Posted on:2014-07-29

Degree:Master

Type:Thesis

Country:China

Candidate:X H Wang

Full Text:PDF

GTID:2268330422957490

Subject:Electronic and communication engineering

Abstract/Summary:

PDF Full Text Request

Search engine is indispensable tool which captures information, and providessearch service for Internet users. This help the user obtain information from theInternet quickly. But under the impact of "big data" which bring the amount ofinformation and information resourcesâ€™ diversification, search engine encounterednew challenge on the speed and theme relevant aspects. As a result, the nextgeneration of search engine is currently a hot issue.When search engineâ€™s crawler obtains information in the Internet resources, thecapacity on queueâ€™s handling for URL string is insufficient. This article designs andimplements a hash algorithm to handle data quickly. It can be built to one-to-onerelationship between real and keys, and string data in the spiderâ€™s queue is convertedinto a structure of linear table quickly. The ability of dealing with string queue and theperformance of the crawler can be improved. Then build a search engine networkenvironment and choose Heritrix as web crawler frame, and join the crawler hashalgorithm in for testing. The experimental results show that after joining hashalgorithm of web crawler, searchâ€™s efficiency and speed of fetching get obviousimprovement.For a point that search engine performs low in feedback on topic relevance. Theright method is that improve the pagesâ€™ topics relevant which web crawler hadcaptured. Trying to put genetic algorithm into the crawler, then scrape down thecontents of specific topic type and ignore the content which is nothing with theme.The strategyâ€™s idea is that combining genetic algorithm and the vector space modelbased on the content, and ensure the integrity of the crawling on the global optimalcharacteristics of genetic algorithm, and determine the importance of the web page onthe relationship between the web pages, and identify with the theme of relevance withvector space model. After modification, using fixed keyword for testing andcomparing, the total number of pages and page number on the theme has promoted,and the proportion of theme page by about30%, improve the system for the accuracyof information.

Keywords/Search Tags:

Search engine, Crawler queue, Hash algorithm, Genetic algorithm, Topicrelevance

PDF Full Text Request

Related items

1	Research And Design On The Search Strategy Of Focused Crawler Based On Genetic Algorithm
2	Research On An Algorithm Of Focused Crawler In Vertical Search Engine
3	Research On Focused Crawler Technology Of Vertical Search Engine
4	Design And Implementation Of Vertical Search Engine Based On Web Crawler
5	Research And Design Of Vertical Search Engine Web Crawler
6	Research And Design Of Vertical Search Engine Based On Fish-search Algorithm
7	The Research Of Topical Crawler Search Strategy In Web Page
8	The Research And Application Of Vertical Search Engine In The Field Of Group-purchasing Web
9	Research And Implementation Of Scientific Topic Search Engine Crawler Based On Nutch
10	The Research And Implementation Of Topical Web Crawler Based On Improved Shark-Search Algorithm