Font Size: a A A

The Design And Implementation Of The Search Engine System

Posted on:2012-09-09Degree:MasterType:Thesis
Country:ChinaCandidate:Y J ChenFull Text:PDF
GTID:2178330335950731Subject:Software engineering
Abstract/Summary:PDF Full Text Request
Search engine technology comes from Full-text retrieval technology, which consists of web crawler, index storage and engine. The company's current search engine's work process can be divided into two steps. Firstly the web crawler collects the web data from the internet and puts them into local server. Secondly the index program indexes the web pages which are in the local server for establishing the search directory. Meanwhile the engine is also needed to deal with the mass data storage and internet access.This thesis mainly describes the design and implementation of the search engine system. According to the comparison between the rules matching segmentation, statistical word segmentation and understand segmentation, the system will choose the right method to analysis words. The system will improve the speed of the Chinese segmentation and accuracy of the article understanding through improving the existing rules matching segmentation system. Based on the new Chinese segmentation, the system adds the Chinese name identification module for the segmentation of the Chinese name which didn't in the word bank. And then the system adds the new engine broker who is responsible for handling the users'request and distributing the request data to the search server cluster. This module will format clients requests before send it to the search server firstly, and then return the results of the search server to the clients. This module is run as the daemon process in the Red Hat Linux server and considers the load balance problem. The system will compare the actual operation effect of old one and the new one in the development process for making sure the system's stability and the new system's development work. The Chinese segmentation of the new system is improved obviously in the Chinese name identification and language ambiguity. By joining the dynamic summary, the search result webpage will provide the friendlier user experience. In addition, the new engine broker can increase the system's throughput.
Keywords/Search Tags:Search Engine, Chinese Segmentation, Engine Broker
PDF Full Text Request
Related items