Based On The Theoretical Study Of The Digital Organism Database Search Engine

Posted on:2009-10-10

Degree:Master

Type:Thesis

Country:China

Candidate:L F Ceng

Full Text:PDF

GTID:2208360245961709

Subject:Computer system architecture

Abstract/Summary:

PDF Full Text Request

With the rapid development of internet and the dramatic growth of people's requirements for useful information, search engine technology has made great process in the last decade. Most commercial search engines such as google and Yahoo only focus on hypertext, lacking wide coverage over other resources of information .As is known, database play an essential role in information storing and accessing, search engine for database has became an attractive field of computer science in recent several years.This paper has designed a database search engine based on Digital Organism Database System, which is a new generation of distributed database developed by our research office. Digital Organism Database System has been designed to arrange the distribution of databases and dispatch retrieves in a wide area of network, which is made up of multiple server nodes. The search engine based on Digital Organism Database System allows users to retrieve relevant records stored in multiple databases via a serial of keywords.On the basis of popular technology prevailing in traditional search engine, such as word segmentation, text classification and information compression, this paper has improved some algorithms and engineering methods to promote the performance of database search engine. This thesis enhances the innovations and improvements we contributed in theory and engineering of search engine for databases. The major work includes:1 Propose an improved Chinese word segmentation algorithm for large-scale Chinese information processing, which is the basic phase of the building of Chinese search engine. Using prefix tree and dynamic programming, this algorithm boosts the speed of Chinese word segmentation and guarantees relatively high precision. This algorithm also provides a flexible approach to handle out-of-vocabulary words such as person names, place names and organization names.2 Traditional text classifier based on SVM needs abundant labeled training documents, both positive class documents and negative class documents. To resolve the lack of negative training data, this paper propose an effective approach, which integrates Rocchio method and K-means clustering to fetch adequate negative training data for classifier building. Experiment show that our new method could promote the accuracy of documents classifier.3 Propose a well-defined software architect called distributed thread pool technology, which is essential to task dispatching among distributed server nodes.Finally, conduct rigid experiment was conducted to verify the performance of the algorithms proposed by this paper and the functions of the search engine based on Digital Organism Database System.

Keywords/Search Tags:

Digital Organism Database System, search engine, text classification

PDF Full Text Request

Related items

1	Enhanced Symbiotic Organism Search Algoritm And Application Research
2	Research On Search And Location Mechanism Based On Digital Organism Database System
3	The Design And Implementation Of Cross-language Navigational Search Engine
4	The Design And Implementation Of Cross-Language Navigational Search Engine
5	Research On The Topical Search Engine Based On Semantic
6	Research And Application Of Short Text Classification In Search Engine
7	The Research On Method Of Database Search Based On P2P Search Engine
8	Design Of The Digital Organism Database Fault Tolerance Mechanism
9	The Study And Implemention Of Some Key Technologies In Search Engine System Based On Classification Semantics
10	Web Text Mining Research Based On Subject-oriented Search Engine