As a powerful force in the Internet and Web technology developing process, the general search engine is playing a more and more important role in web service for its special advantage. And at the same time, it is becoming the focus of the public. With the huge amount of Web data, different hierarchies and dynamics escalating, it becomes more and more difficulty for people to be satisfied. The problem solving capability of the general search engine is limited, and hence here comes the meta search engine. Meta search engines are software systems which are used on Web application. They can translate the user search requests forward to their component search engines, and then return the results collected from theirs component search engines to the user after the complex process.This dissertation will give a thorough discussion about the general search engine and meta search engine. The crucial technologies of the meta search engines, search result ranking and search result de-duplicate, are analyzed deeply.The main work and research achievements are as follows:(1) By analyzing the fact that the good performance component search engine can find good quality results and that the good quality results can be found by the good performance component search engine, and referencing the interrelationship of the web page linkage, the dissertation gives the concept of the Hub value of the component search engine and the Authority value of the search results.(2) Noticing that the Hub value of the component search engine can fluctuate in one search process, the dissertation gives the algorithm of using a set of certain topic words to calculate the Topic Hub value of the component search engine. So the domain of the component search engine is divided according to the topic. For one component search engine, different topic fields can be described by different Topic Hub values.(3) The dissertation ranks the search results using the Top Hub values of the component search engine.(4) The dissertation analyses two de-duplicate technologies: web page based and search result based; two types of summary distill methods: static method and dynamic method. The dynamic method using key words are analyzed. The phenomenon that there are many common statement segments for the summaries from the search results extracted from the reprint web page or the same web page are analyzed too. The dissertation presents the de-duplicate algorithm using the statement segment similarity degree. The algorithm constructs the statement space vectors using the statements set from the result summaries, and thus gets the similarity of them.(5) MetaSearch, a meta search engine system, is well implemented. It shows that our algorithms are valid and effective. |