Font Size: a A A

Design And Implementation For Topic Specific Meta Search Engine Based On Web Data Mining

Posted on:2010-12-16Degree:MasterType:Thesis
Country:ChinaCandidate:D B YangFull Text:PDF
GTID:2178360278962182Subject:Computer technology
Abstract/Summary:PDF Full Text Request
This thesis introduces a way of using web data mining based on open source search engine Nutch 0.9 and related software package to design and implement a topic specific meta search engine TSMSE in order to improve recall and precision of general search engine.We first develop a tool TopicDistiller based on web content mining and web link analysis, to extract topic words and seed websites from search engine's retrieved web pages for topic expressing and later can be used for topic determine and topic level calculate in TSMSE.Then, we put forward our design philosophy of topic-specific meta search engine TSMSE with independent database in order to combine the merit of meta search engine that improve recall by integrating source search engine results and that of topic-specific search enine that improve precision by topic-focus crawling and indexing.After then, we set up Nutch crawler by its different crawl style to merge web pages searched from source search engines into those crawled from specified seed websites together. After that,we develop topic parser and topic indexer plug-ins to determine each page's topic,calculate its topic level and then provide retrieve services with improved precision by these data.Since all search engine's results merged together,the recall is also improved.Forthly,enhancing the retrieve function and interface of Nutch 0.9, we implement our topic specific meta search engine TSMSE with independent database to provide user retrieved results output divided by topic and sorting by topic level. TSMSE also provide user key words intelligent prompt by history retrieved records and retrieved results automatically clustering as well as paged browsing.At the end of this thesis , using six different topic we finished the simple performance testing of TSMSE. The experiment results show that both recall and precision are improved.
Keywords/Search Tags:Meta Search Engine, Topic Specific Search Engine, Nutch, Web Mining, Topic Specific Crawler, Data Mining
PDF Full Text Request
Related items