Font Size: a A A

Design And Implementation Of A Focused Search Engine Template Based On Lucene

Posted on:2012-07-17Degree:MasterType:Thesis
Country:ChinaCandidate:X D YangFull Text:PDF
GTID:2178330332483295Subject:Communication and Information System
Abstract/Summary:PDF Full Text Request
As more and more information becomes available on the World Wide Web, it becomes more difficult to provide effective search tools for information retrieval. Due to the large size and dynamic of the current Web, universal search engines can only crawl and index a portion of the Web. Universal search engines are very difficult to offer users comprehensive and timely information about the search services. In contrast to the universal search engines which attempt to create index for the whole Web, focused search engines is only associated with the topic related web areas. Thus, it can crawl the specific web areas more deeply over shorter period. Focused search engines use rich contexts information and effective crawling strategy to guide the imformation search with the goal of finding highly relevant target pages. The design and implementation of the focused search engine is going through a highly creative phase. A lot of work about machine learning is being applied to the task. In the thesis, the author surveys the state-of-the-art technology about focused search engines and studies related methods for implementing a focused search engine. In the same time, the author studies and researchs lucene, an open-source full text searching package. Based on the study, the thesis discusses a number of the focused crawling algorithms or strategies that are representative of the dominant varieties published in the literature. Meanwhile, the thesis describes the design and implementation of a focused search engine template in details. By using the template, users could test the value of theirs ideas on focused search engines.The focused search engine template put forward by the thesis is different from the focused search engine that is limited by a specific theme.The template is geared to the needs of all kinds of users. All kind of users can use the template by injecting their unusual theme into the template.After the theme is injected into the template, the template can work as a focused search engine that is limited by the theme.The thesis also put forward a new method about text classify and a new focused crawling strategy.
Keywords/Search Tags:focused search engines template, lucene, text classify, focused crawling strategy
PDF Full Text Request
Related items