Font Size: a A A

The Study Of The Framework Of Distributed Intelligent Search Engine Based On Map/Reduce

Posted on:2009-06-05Degree:MasterType:Thesis
Country:ChinaCandidate:Z C FuFull Text:PDF
GTID:2178360272971074Subject:International Trade
Abstract/Summary:PDF Full Text Request
With the economic rise of search, more people begin to concern the world's major search engine performance, technology and daily flow. An enterprise will choose whether to launch advertising based on the search engine popularity and daily flow, as ordinary internet users, which choose a favorite search engine to find information according to search engine performance and technology, as technicians, will choose a representative of search engine as the research object. The economic rise of search engines, to the people once again demonstrates the Internet by the tremendous business opportunities. Without search engines , Internet will be left only empty clutter of data, as well as so much gold miner which needs digging with hard sledding. Today, the information in the Internet is mounted up exponentially everyday, and in the face of massive data processing and storage, the traditional centralized search engine appears to be powerless. On the other hand, traditional search engine system is generally used words matching model, and unable to understand customer search intentions, making it very difficult for the users to search on the Internet for the really wanted information. Therefore, the distributed intelligent search engine is the future development trend.From the research and design point of view, this thesis makes a detailed analysis and discussion on the distributed intelligence of the search engine-related theory and technology. The research on the framework is subdivided into three levels which are correlated with each other closely to support the distributed intelligent search engine based on the Map/Reduce. The first is the theory and methodology of distributed Parallel Computing. The second is the Principle of search engine. The third is the theory and methodology of the distributed intelligent search engine. The main content of the thesis is as follows:Firstly the thesis discusses the current development status of search engine at home and abroad, as well as the existing problems and the development trends. After analysis of the search engine's working principle as well as some of the main functions, the theory of distributed computing, grid computing, cloud computing. Map/Reduce Distributed computing model are elaborated. And the open source search engine kit Lucene, open-source distributed computing framework Hadoop are analyzed and studied.Based on the Map/Reduce distributed computing model and semantic dictionary, the distributed intelligence of the search engine system is studied. The distributed intelligent search engine - IEBSou, which based on the Map/Reduce, is designed and implemented. And the thesis focuses on the framework for the realization of the IEBSou system. Not only displays the relationship between the modules, but also analyzes the implemented principles and ideas of the various modules. After that the basis of the framework of the IEBSou's Map/Reduce is designed. Combined with Lucene, a unified framework for dealing with the document is designed, and then the names in Chinese word recognition and recognition of new words have been studied. The elimination re-page algorithm based on the Map/Reduce and the search recommended word generation algorithm based on the semantic association are proposed. Through constructing a concept set, IEBSou can intelligently generate the semantic related words for the users. On the other hand, with semantic dictionary, IEBSou will conduct a Semantic extension for user's searcher keywords and build a concept set, so the system can intelligently understand the user's searching intent, and improve the recall and precision.
Keywords/Search Tags:Search Engine, Distributed Computing, Map/Reduce, HDFS
PDF Full Text Request
Related items