Font Size: a A A

The Research And Implementation Of Distributed Search Engine Based On Lucene

Posted on:2019-01-14Degree:MasterType:Thesis
Country:ChinaCandidate:Y H CuiFull Text:PDF
GTID:2428330590975239Subject:Software engineering
Abstract/Summary:PDF Full Text Request
With the rapid development of Internet technology,search engine has become an important means for people to get information.So far,the general search engines such as Google,Bing and Baidu have been able to satisfy people's need for general information acquisition.In the meanwhile,the enterprise search engine attracts more and more attention because of its importance to enterprise management and employee collaboration.As the sacle of the enterprise is expanding,there is more and more information being created.In order to continuously improve efficiency of use of document information,the design of enterprise search engine usually should meet the metrics of a distributed system architecture,such as high concurrency,scalability,high availability and so on.This thesis aims to research and implement a distributed search engine based on Lucene framework used for enterprise search.The research contents mainly include the technical principles of search engine,the pratical application of Lucene framework and the working mechanism of Hadoop distributed platform.Based on these researches,the thesis designs and implements an enterprise-level distributed search engine:1.Design and implement the data acquisition for different types of data sources.For example,the data of an internal CMS sites in an enterprise is acquired by using a web crawler;The data in those statically stored complex structure files is acquired by specific extraction program.2.The index module is completed by MapReduce distributed computing model and Lucene framework.3.Combine PageRank with TF-IDF algorithm,a topic-related search results sorting scheme is provided.4.According to the nature of enterprise management,RBAC and user grouping permission model are introduced to implement the access control of search results.5.In the virtual machine cluster environment,the related functional tests and performance tests are carried out,and the test results show the feasibility and effectiveness of the system.
Keywords/Search Tags:Search Engine, Distributed System, Lucene, Hadoop
PDF Full Text Request
Related items