Font Size: a A A

The Design And Optimization Of A Full-text Database-oriented Search Engine System

Posted on:2009-09-01Degree:MasterType:Thesis
Country:ChinaCandidate:H M ZhangFull Text:PDF
GTID:2178360242983082Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
As the internet industry grows faster and faster, various kinds of new requirements have appeared in the applications of database systems. Particularly, the appearance of large-scale databases and data islands make the sharing and integrating of databases, especially scientific databases, become more difficult.To solve this problem, DartGrid was developed based on the traditional solution of database integration. It took full advantage of the concept of "Database Grid and Semantic Web" and became a succesful solution of data integration between physically or logically different databases.DartSearch full-text search engine system was generated from the DartGrid core platform. Now DartSearch has developed to its third version, and this thesis is about the design, optimization and implementation of DartSearchV3 System.At first, we briefly introduced the latest development in the area of large-scale scientific data sharing and search engine technology, and then introduced some basic knowledge of Lucene. Then we'd like to analyse the existing problems of DartSearchV2, and begin to put forward the key problems that DartSearchV3 have to solve and the structure and system design of DartSearchV3.The focus of this thesis is about the Chinese word segmentation algorithm, the index mechanism, the rank mechanism, which are the three core modules of DartSearchV3 system, and specially introduced the technology, architecture, algorithm, the core implementation, and the results of these modules. In addition, we also introduced a VML based semantic tool kit and a realated picture searching tool. While talking about the design and optimization, we always focus on system functionality, practicality, ease to use of the full-text database-oriented search engine system.In the last part of this thesis, we briefly analysed some problems that DartSearchV3 system should face in the future, and pointed out the development direction.
Keywords/Search Tags:DartGrid, Chinese segmentation, Index, Rank, VML Semantic graph, Lucene
PDF Full Text Request
Related items