Font Size: a A A

Research And Implementation Of Meta Search Engine

Posted on:2012-06-06Degree:MasterType:Thesis
Country:ChinaCandidate:C Y WangFull Text:PDF
GTID:2178330332999628Subject:Software engineering
Abstract/Summary:PDF Full Text Request
Research and Implementation of Meta Search EngineWith the rapid expansion of network, people can get valuable knowledge from the mass information with the help of search engine. However, the existing independent search engines still have many problems, for example, the resources coverage is limited, and the result set is huge and complex. It is hard for people to find out the desired information easily and quickly. For the same retrieval request, search engines may return different results, users need to switch between different search engines to find comprehensive and valuable information. Meta search engine comes into being on the basis of the independent search engine, it can call several component search engines simultaneously, then present processed result sets to users. Compared with the independent search engine, a meta search engine has more sources of information, improves the retrieval recall ratio and precision rate to a certain degree, thus it can better satisfy users'retrieval demands, and its application are increasingly widespread.Component search engine often returns a large number of retrieval results, however, differences between component search engines, such as the different query parameters, different sorting methods and the differences in the correlation function, have brought difficulties to results merging, restricted the function of meta search engine. Based on the research and analysis of key technology, using Eclipse as the main development tool, we designed and implemented a simple meta search engine MSE, the main work we done are as follows:1.We introduced the working principle and key technology of meta search engine, analyzed the algorithm involved in the key technology.2.We conducted the thorough study of the scheduling algorithms of component search engines, made comparative analysis of some typical algorithms. The system we developed calls four search engines, including Baidu, Yahoo, Sogou and Google to conduct the search simultaneously. In order to solve the problems such as the query parameters differences, we gave the character encoding method and query format transformation method. By the way of parallel calling method, we can shorten the retrieval time, ensure the retrieval efficiency.3.We conducted intensive research on results merging technology of meta search engine. A simple dead link detection algorithm is used to eliminate dead links, samefile() method is used to identify duplicate URL and remove duplicate pages. We also made an improvement on abstract ranking algorithm. The experimental data indicated that the new algorithm improves the relevance of retrieval results to some extent, and can perform consistency sorting in accordance with global relevance. In this paper, system performance is analyzed, the results show that MSE improved retrieval efficiency to a certain extent, and there is still room for further improvement. MSE should be improved in some directions:1.The system only has four component search engines, we should select an appropriate scheduling strategy and allows the system to choose search engine which has good performance automatically.2.In the aspect of results merging, we should use a more effective algorithm to eliminate duplicate pages, and realize retrieval results automatic clustering to facilitate the users to search and browse relevant information.3.The system should provide personalized service, give users a wider range of options. We may combine feedback learning algorithm with user profile to achieve intelligent information filtering.
Keywords/Search Tags:Meta Search Engine, Component Search Engine, Duplicate Web Pages Removal, Results Merging, Abstract Ranking
PDF Full Text Request
Related items