Font Size: a A A

Study On Implementation Of Enterprise Search Engine Based On Index Cloud

Posted on:2012-08-01Degree:DoctorType:Dissertation
Country:ChinaCandidate:X Y ChenFull Text:PDF
GTID:1118330371457136Subject:Management Science and Engineering
Abstract/Summary:PDF Full Text Request
With the high speed development of the enterprises informatization, the internal data resources of enterprises are rapidly raising. Therefore, the enterprises call higher request on the information management and resource access, which brings out the higher demand for enterprise search engine. Index Structure is one of the core technologies of search engine and has an influence on the performance of whole search engine directly. For the design of enterprise search engine, this dissertation applied the typical thought of Cloud Computing to the index system and presented a novel index framework:Index Cloud. We also presented a new architecture of enterprise search engine based on the design of Index Cloud.The paper firstly gives the concept of a search engine and how the search engine is classified. Then studied the principle and technology of the search engine, and take a look at the development of the search engine. On the other hand, the paper states the concept and classification of the Cloud Computing. Then the paper studied core technology of Cloud Computing.This organization of the index and then conducted a detailed study to explain the concept of indexing, and index files are organized on the organization of several commonly used B-tree index, B+trees, R-tree, R*tree for a detailed research and discussion. Then the composition of the index entry methods, such as being ranked index, inverted index, suffix array, the signature document technology are discussed. Cloud computing in the search engine and based on the theory, based on the index theory of the index cloud model, the model classification based on data storage, distributed computing and parallel processing of three basic principles of design, with a high degree of virtualization, high performance, high reliability, strong security, scalability, versatility and other notable features, more suitable for enterprise search engine requirements.In this paper, a comprehensive index of the cloud model to study in depth. Detailed definition of the cloud is given an index, the index of the principles of cloud; cloud the basic characteristics of the index. Index for search engine retrieval performance in organizational strategy and scalability problems, etc., in the basic index-organized strategy comparison, this cloud system, the index uses a hybrid distributed index-organized strategy. Cloud data in the index structure, to use of a new B+tree-based dictionary index tree(DicB+Tree) forming DPIC(Distributed & Paralleling Index Cloud), and based DPIC, an index designed to cloud the core management strategies to ensure that the system resources can be utilized. Research shows the index of the main cloud of internal processing architecture, distributed parallel index tree structure, the index distribution of the cloud index data, index data replication, data migration and reconstruction of the index method. In addition, This paper describes the index to retrieve data in the cloud analysis tasks, distributed scheduling process.Then this systematic review of the concept of enterprise search engine and features, enterprise search engine technology, Research, analyzed the needs of enterprise search engine in the search, retrieval, retrieve objects, and security aspects of traditional web search with the existing differences. Therefore, we need a system architecture from the search engines, indexes organizational strategy, information retrieval algorithms and scheduling algorithms in a comprehensive study of enterprise search systems, search engines and the proposed business combination of cloud computing.The design of Index Cloud model is based on three fundamentals:data classification storage, distributed computing and parallel processing. It is characterized by visualizations, high performance, high reliability, strong safety, easy extensibility as well as universality; hence it can be more suitable for the requirements of enterprise search engine.The architecture of enterprise search engine based on the Index Cloud is further put forward. The new architecture not only resolves the problems exist in full-text searching system, such as index data inflation, network bandwidth bottleneck and disk I/O capability bottleneck, but also provides efficient data storage and parallel computing service. A distributed task scheduling model is established for the architecture, which took the task load level of index node and the index frequency into account with the purpose of optimizing task allocation, avoiding hot spots and ultimately improving the performance of system.Finally, a prototype system of Index Cloud based on Hadoop and Lucene has been constructed as a platform for the validation of system performance. We have conducted extensive simulation studies for response time, throughput, load balance and precision ratio. The experiment results demonstrate its feasibility and satisfactory applicable effects.
Keywords/Search Tags:Enterprise Search Engine, Index Cloud, Index Organization Strategy, Tree-Index Structure, Task Scheduling Design
PDF Full Text Request
Related items