Font Size: a A A

Research And Design Search Engine Based On Distributed And Parallel Computing

Posted on:2006-11-21Degree:MasterType:Thesis
Country:ChinaCandidate:H Y LiuFull Text:PDF
GTID:2168360152989116Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
The Internet can be seen as a very large information database. The problem is also becoming important in large Internet, where we want to extract or retrieval useful information to support a search. Search engine becomes the most popular network service of information retrieval.The search engine is generally made up by Crawler , index storehouse , searching device and user interface. Crawler downloads pages from Web; parser is it analyses in order to use for index of setting up to go on to content to download page; The index shows the file for a kind of way easy to search and stores in the index database ; The searching device realizes users inquire the keyword and goal file match the calculation of degree; User interface offer to user one is it is it ask to inquire about to input, is it inquire Web page of result and inquire result is it give the browser to return after formatting to customize.The target of dealing with the search engine is very huge data amount, at the same time the structure of Internet is with cloth type, search engine design distributed system that undertakes the parallel processing of spends several machines in coordination with computing meanwhile , can make better cost performance in distributed method to run side by side. This text expound the fact that structures a kind of Web search engine model framework based on distributed parallel computing. Combing both SPMD and Task-Farming model, here is the parallel programming model to be used in this study. Adopt the function resolves, changes and takes the place of resolving, geometry resolves the technology of resolving combined together. Threads are popular paradigm for concurrent programming on parallel programming. We design parallel program with Java Thread and Thread Group in the paper. It is effectual that news transmits to distributed running side by side programming; adopt Socket communication way of Java in the distributed news transmission that calculates.As to mode, principal and subordinate of structure being mainly distribution that URI collected when using for initial, walk abreast and pick by several nodal machines, is it deal with nodal machine of URI this to get as to each main HashCode mould N of land of URI that node draw, namely should number the machine to deal with to send to, to each nodalmachine it is SPMD model, their treatment procedure is similar, URI that the difference lies in dealing with is different.We design search engines that index a portion of the web documents as a full-text retrieval system. The techniques of crawl the web is to start with a set of URLs and from there extract other URLs which are followed recursively in a breadth-first or depth-first fashion. We discuss Indices , Robot protocol web analyzing , data processing and Chinese word segmentation . The future and application of the web search engines has been introduced. Have recommended by must check the Recall, Precision and test of the search engine of the rating. On result in search engine deal with, is it compare several search result arrange in an order algorithm include PageRank, HITS, HillTop algorithm to analyse, after analysing the defects of PageRank algorithm and HITS algorithm, have proposed it because of the dependence between the key word and anchor text, key word, and file dependence and relevant minimum set of file analyses the improvement algorithm thought of combining together. Have discussed the future devebpment of the search engine. Recommended several kinds of movements to search for at the same time, the field is searched for, the individual search for the application mode.
Keywords/Search Tags:Search Engine, Distributed, Parallel, Segmentation, Thread, JAVA
PDF Full Text Request
Related items