Font Size: a A A

Research And Implementation On Search Engine

Posted on:2010-03-09Degree:MasterType:Thesis
Country:ChinaCandidate:J G YanFull Text:PDF
GTID:2178360275995571Subject:Computer software and theory
Abstract/Summary:PDF Full Text Request
RSS (Really Simple Syndication Simple Syndication), also known as Rich Site Summary "(rich site summary), or" RDF Site Summary "(RDF Site Summary) is widely accepted and applied, the rich resources of the RSS sites has greatly improved Internet browsing and application information. a general search engine for the RSS format Some of the content, the existence of the search is inefficient, slow update problem. The RSS-based search engine to overcome these shortcomings, the realization of a high efficiency, fast search this type of RSS page. This article describes the search engine's basic working principle, RSS and RSS the concept of search engine theory, research and realization of the RSS of the collection of seeds, RSS page resolution, the establishment of the index, as well as search and search results sorting and so on.This paper introduced the basic principles of search engine and based on the RSS specification, first of all, research and design a network to collect reptiles RSS seeds. Secondly, the study of analytic RSS seeds, using XML parser to parse the RSS seeds, extract link, title, description and other elements. Third, the seeds of the RSS page document described in analytical research, preparation of the corresponding function can extract the title of the document, link, description. Finally, the study of the text pre-processing, search and other technology to sort the results were based on the use of the document vector space model, rule-based segmentation algorithm for Chinese, as the inverted file indexing, vector similarity algorithm to sort search results.
Keywords/Search Tags:Search engines, website ranking, Chinese word segmentation, word stem extraction, indexing
PDF Full Text Request
Related items