Font Size: a A A

Design & Practice Of Topic-Specific Search Engine System

Posted on:2004-01-01Degree:MasterType:Thesis
Country:ChinaCandidate:Z Y HouFull Text:PDF
GTID:2168360092995623Subject:Library science
Abstract/Summary:PDF Full Text Request
Internet is becoming to be the largest info base although the Web is full of a mass of pages. It is the great problem facing to the information scientists that how to make people obtain the information quickly and accurately. This thesis discusses the technology of web information retrieval both on theory and application and puts forward a framework for iRobot system.The thesis is divided into three sections. The first section is an introduction, in which I discusses the present situation of the web information retrieval and analyses some existent problems. In the part, I narrate the related works of other researchers in China and all over the world. The second section analyzes the theory of IR and technology of Search Engine. This section also discusses some key technologies used in web information retrieval, including web information indexing, information filtering, web mining, ontology, intelligent agents, and web search algorithms. The last s ection i s the implementation o fi Robot, c oncretely n arrating the frame o f this system.iRobot system is a topic-specific dynamic search system with inner professional lexicon whose aim is to retrieve the information for the specialists or some special committee. The kernel of iRobot is divided into three parts. The first part is the initial part of this system. My basic idea is to automatically obtain an initial set of 'good' links to pages that are relevant to the user's query, and continue the exploration from this set. I first put the query into the metasearch engine (attachment of iRobot) and gain a set of the first 'good' links. Then I distill the set with the simplified HITS algorithm to form the ultimate 'good' links set. The second part is the search module. iRobot revises the Fish search algorithm by adding the analysis for the keyword in the context. With this revised Fish Algorithm which also uses the similarity relevance instead of binary e valuation, iRobot crawls the web. And for increase the speed of crawling, I introduce the multi_thread technology into the system. The last part is the result process module. iRobot prunes the useless content in the page, for example the advertisement. Then the system classifies the pages retrieved by the robot and stores them into the database to offer users the ultimate information. iRobot will receive the relevance feedback of the result and revise the keywords and initial link set.Finally, the thesis summarizes the experience of designing iRobot system.
Keywords/Search Tags:web information retrieval, information gathering, Search Engine, Topic-Specific Search Engine, iRobot system
PDF Full Text Request
Related items