Font Size: a A A

Research And Implementation Of Enterprise Heterogeneous Data Search Engine Based On Solr

Posted on:2016-11-23Degree:MasterType:Thesis
Country:ChinaCandidate:W R DingFull Text:PDF
GTID:2308330482975090Subject:Software engineering
Abstract/Summary:PDF Full Text Request
With the increasing size of enterprises, vast amounts of data generated from inside network of enterprises distributes on each server node. How to find the information need accurately and quickly from the internal search engine is a key problem to be solved. Although the current general search engines can retrieve data, they do not meet the needs of enterprises. First of all, the complexity of enterprises and permissions for different employees result in the different demands for the target information. Second, the type of enterprise data vary widely, most of the internal data is from the database and documents, which differs from the web-based Web resource of general search engines. Consequently, it is not a perfect choice for enterprises to complete work tasks by the general. The purpose of this thesis is to propose an enterprise search engine designed for large enterprises which have data centers across regions. The search engine can integrate heterogeneous data and retrieve as well. At the same time, it optimizes the ranking of search results and recommends information based on user personalization model. This thesis realized the enterprise heterogeneous data search engine based on Solr. The main contents are as follows:(1) The access to external heterogeneous data is depend on Heritrix. The information extraction, metadata and the Chinese word segmentation technology are also studied in the thesis. The thesis implements intelligent extraction from heterogeneous resources, setting up a model for establishing indexes of resources, and achieving a search system for heterogeneous data.(2) The open source search engine Solr realized the function of increase, delete and modify the index. Solr achieved the function of query, such as the most basic keyword queries, as well as advanced search with other confined conditions.(3) The Skyline algorithm is introduced to optimize the sort order of search results. It can accommodate the complex combination of relevancy score and resources’distribute time, in order to meet the needs of users in different work scenarios. The user personalization model is built according to the user’s history record and use habits. It can recommend proper information to the user.(4) A monitoring system based on zookeeper is designed for enterprises which have data centers to monitor. This will ensure that the data on failure servers will not be searched by users. If there are repeated results need to be sent, the one from the better network environment will be chosen.The thesis developed an enterprise heterogeneous data search engine prototype based on Solr. The environment equipped with a multi-server cluster where heterogeneous resources were constructed to index. Experiment results have been obtained through tests which proves the reliability of the related study and practical application. This thesis provides a feasible solution for enterprises.
Keywords/Search Tags:Enterprise search engine, Solr, Heterogeneous data, User personalization model
PDF Full Text Request
Related items