Font Size: a A A

Study And Application On Search Engine Technique For Chinese News

Posted on:2007-02-04Degree:MasterType:Thesis
Country:ChinaCandidate:B WangFull Text:PDF
GTID:2178360182496101Subject:Software engineering
Abstract/Summary:PDF Full Text Request
The WWW, which stores a great deal of data, has propelled the development of Internet, becoming the information platform for the corporations and the individuals. Meanwhile, the great increasing of the information quantity results in the requirement for information navigation and search in the WWW. So Web data-mining has become a significant part of data-mining.Web data-mining is a process of searching interesting, underlying, useful pattern and data in the WWW made by people. Compared by the traditional data-mining, Web data-mining possesses the properties as follows: 1. It has higher requirement for efficiency of the algorithm. 2. It has great concurrence property. 3. It has dynamic property. 4. It must organize and manage data efficiently.Web data-mining comprises Web content-digging and Web structure-digging. Therein,Web content-mining is a process of obtaining useful information from documents and their descriptions in the Web which are organized by people, and Web structure-mining is a process of obtaining useful information from the link structures which are organized by people.Search engines technique does meet the requirement for searching information in the Web at these two aspects——Web content-mining and Web structure-mining. Here, traditional search engine is chiefly composed of three parts:Web Crawler——>Indexing Engine——>Search RankingFirst, the paper analyzes and studies the general techniques of these three parts above, and then develops and reforms them for the sake of Web news in order to structure special Chinese news search engine. The major study job of the paper focuses on the five aspects as followed:1. The paper introduces the background, history and developing course of search engine, and classifies the present search engines exhaustively. Meanwhile, the paper also argues the functions and principles of search engines of different types, based on which it analyzes advantages and disadvantages for search engines of different types.2. The paper abstracts search engine into three parts by analyzing and discussing the news search engine's structure and data flow. The three parts is Web crawler, indexing engine, search ranking.3. In the chapter of Web crawler, the paper expounds several crawling strategies based on auto-analyzing and method of html tree-like. And in...
Keywords/Search Tags:Application
PDF Full Text Request
Related items