Font Size: a A A

The Vertical Search Engine Research And Design

Posted on:2010-01-27Degree:MasterType:Thesis
Country:ChinaCandidate:F M LiFull Text:PDF
GTID:2208360275483392Subject:Computer system architecture
Abstract/Summary:PDF Full Text Request
Due to individual and professional needs, the general-purpose search engine can not satisfy the requirement. It resulted in the naissance of vertical search engine. The vertical search engines can deliver more relevant results to satisfy users'requirement.we focused on several key technologies about chinese vertical search engine In this thesis, and implement a simple search engine, including spider, web extraction, Chinese word segmentation, indexer. the various parts are interrelated with each other.We propose an UBFC(URL rule based focused crawler) algorithm based on an experimental crawler and a focused crawler . The kernel of our algorithm is an URL regular expression learner, which is used to automatically learn and generalize the regular expressions of URLs of the sample webpages. including the following sections: URL Filter, pilot study, classification identification, rule learning; we excavated the correlation between the subject and Links in order to judge whether the URL crawled. redesign the dictionary mechanism and query algorithms ,the double-character-hash-indexing and verbatim dichotomy segmentation dictionary mechanisms is proposed in this thesis. we use web page characteristic and submitted keywords to recognitions new word. we proposes a method to content extraction from web pages.We design and implement a simple search engine. The global structure of our system and relations of the components of system are introduced. Some components are detailed in function and implementation. Finally a simple evaluation about searching effect and performance is given.
Keywords/Search Tags:vertical search engine, spider, Chinese word segmentation, Extraction information
PDF Full Text Request
Related items