Font Size: a A A

Professional Search Engine Research And Design

Posted on:2006-07-06Degree:MasterType:Thesis
Country:ChinaCandidate:Z X ShouFull Text:PDF
GTID:2208360152470378Subject:Computer applications and technology
Abstract/Summary:PDF Full Text Request
The information in Internet increases everyday, thus Internet has become a widest information repository. But the information in Internet is numerous and complicated, thus the problem of information acquisition, which should be within a short time, high accuracy and complete, becomes an urgent task for computer scientists. The emergence of searching engines can resolve this problem in a certain extent, and can help users find their needed information conveniently. However, today's searching engines has a low coverage of web pages, do not index contents timely, and has an imprecise searching results, so it cannot satisfy the need of professional users.The main purpose of our thesis aims at theory analysis and system design for specialized searching engine, which is a development direction within searching engine domain. Among aspects of information collection of specialized searching engine, in order to achieve the best search path, we adopt non-greedy IpageRank strategy to direct the dynamic adjustment of download direction for web crawler, and download the web pages, which might contain correlative topic, with high priority, thus effectively achieve the customization of searching engine. Considering the different characters of specialized searching engine comparing with traditional searching engine, we newly design the correlativity of information retrieval, and apply content and architecture of web pages based vector space model algorithm. In order to resolve the problems of fuzziness of searching keywords and imprecision of searching results, we use ontology to do searching instead of key words. On the other hand, we try to employ linked documents and correlative concepts to filter useful information, while in the traditional way linked structure is used to improve searching accuracy.Our online system also provides two kinds of data mining methods, which are association rules and clustering, thus users can conveniently explore and browse these searching results.The main work of this thesis includes the following aspects:We give a survey of searching engine, analyze the main problems in today's specialized searching engine, and discuss the primary defects of searching strategies of searching engine.2. Aiming at problems of ambiguities of Chinese word segmentation, we discuss overlapped ambiguities in Chinese word segmentation and propose the corresponding resolving methods.3. A non-greedy IpageRank searching strategy is put forward, and an improved VSM vector model method is adopted for correlative filtering of web pages.4. In order to solve the fuzziness of key words searching and imprecision of searching results, an ontology based sort algorithm is proposed, which uses ontology of searching words to determine and sort the correlative web documents. Thus we can resolve the problems of synonymy, ambiguity and context sensitivity in text retrieval process.
Keywords/Search Tags:Ontology, Data Mining, Search Engine, Specialized Search Engine, Spider, Chinese Phrase Segmentation
PDF Full Text Request
Related items