Font Size: a A A

Distributed Based On The Search Engine Irst Improvements

Posted on:2009-07-05Degree:MasterType:Thesis
Country:ChinaCandidate:H HeFull Text:PDF
GTID:2208360272459188Subject:Computer software and theory
Abstract/Summary:PDF Full Text Request
With the development of internet applications, network applications and services have become more and more common among software systems. Various forms of distributed systems are composed of the multiform network environment as well as the multiple types of applications and services. How to enable the internet applications and services to communicate with each other and how to make the customer system possible to discover and invoke the applications and services in a unified and standard way has become a practical and important topic. The Web Services, proposed by the international standard organization, is to solve this problem. It has a series of related network standards. The search engine, as the most important network application service, should provide the distributed invoking method that can be used by other client applications conveniently. The search engine based on the IRST (Inter-relevant Successive Tree) was implemented as a stand-alone software application, which can be used only in a single machine and does not have the distributed invoking ability. This article has described the improvement from the original search engine to a distributed system using Web Service technology.Along with the development of CPU manufacture industry, technology of the fabrication has already encountered with a physical limit and the traditional Moore's Law has already expired. People can hardly increase CPU frequency and CPU manufacturers are focusing on the multi-core design craft these days. It is no longer practical to expect better performance from the increase of CPU frequency. A new computing method known as distributed computing is on the horizon, the most important feature of which is that the application runs parallel on a computer cluster composed of many single nodes (single core or multi-core). This computing methodology is especially suitable for large scale data processing such as the indexing of search engine. In this article, we use the MapReduce distributed computing framework to improve the indexing of the search engine that is based on IRST. In this way, the indexing process can be completed with the parallel processing on the computer cluster. As a result, this method can reduce the consumed time for indexing to a large scale.
Keywords/Search Tags:Search Engine, Distributed System, Web Services, Inter-relevant Successive Tree, Distributed Computing, MapReduce
PDF Full Text Request
Related items