The Implementation And Optimization Of Large Scale Enterprise Search Engine

Posted on:2016-04-27

Degree:Master

Type:Thesis

Country:China

Candidate:M Xuan

Full Text:PDF

GTID:2308330479482181

Subject:Software engineering

Abstract/Summary:

PDF Full Text Request

When an enterprise has a lot of documentations, they often use search engines for their information retrieval purpose. Traditional enterprise’s search engine solution is generally through purchasing commercial search engines. However, the documentation is different from the web page, they are much larger than a web page. When dealing with such a large amount of data, traditional general-purpose search engine usually got a poor performance.This project provides a solution to build an enterprise search engine, which can handle large scale of data and achieve the state-of-art. The core is the use of the open source distributed search engine "ElasticSearch", and optimise its configuration according to retrieve these documents on demand. The optimization including: redesign the index schema; adapt indexing storage strategy and so on. These approaches have decreased the size of the index by 55%, shortened the index rebuild time by 50%, and increased the index query response speed lines by 45% without removing the highlight functionality.This project has also developed a number of search engines core functions, which can provide a better search experience for users, including: spelling correction function, search query recommendation and personalised search results.Spell Correction module is used to address the misspellings of Chinese words, misspellings of English word, spelling errors of camel string, and supports user-defined dictionary to deal with common spelling errors. The module is mainly based on the distance(including edit distance, Pinyin distance, character string distance, etc.) to generate the correct candidate words. Then use learning to rank to select the best corrective results. The module got a score of 86% based on the best N measurement on a selected test data set.Query Recommendation module dedicated to helping customers optimise queries. This module uses the query logs and document corpus to generate recommendations. The module also uses the Learning to Rank method to select the top 10 best recommendations as a result of the query recommendation. Using selected test data and label the ground truth by human, the module obtained a 100% coverage and more than 90%(92.6%) accuracy.Personalise the search result is committed to providing the best search results to a determined user. This module uses a model based on user model generated by a topic model training progress and user history click record. Also,we have design a method to integrate this functionality with Elasticsearch friendly. On the labelled data sets, this module allows the average rank of the user-clicked document on the page rose 5(from 11 to 6).Research on this topic is providing a viable solution of the scene when build and optimise enterprise search engine. Open source search engine, although currently reached a level out of the box, but the auxiliary search function tuning and search results optimization is still a lack of more complete implementation. This project is committed in a real scenario, developed to deal with huge amounts of data, to build high performance and user-friendly search engine. The project is trying to complete the implementation of the program and record the results of comparative tests, in order to enrich the relevant documentation in the field.

Keywords/Search Tags:

Search Engine, ElasticSearch, Learning to Rank, Spell Check, Query Recommendation, Personalize Search

PDF Full Text Request

Related items

1	Implementation And Optimization Of A Large-scale Enterprise Search Engine
2	Design And Implementation Of A Real Estate Search System Based On Elasticsearch
3	The Research And Implementation Of Distributed Intelligent Search Engine Based On Elasticsearch
4	The Design And Implementation Of Education Resources Recommendation System Based On ElasticSearch
5	Research Of Vertical Search Recommendation System Based On Elasticsearch
6	Design And Implementation Of Distributed Search Engine Based On ElasticSearch
7	Design And Implementation Of E-Commerce Recommendation System Based On Elasticsearch
8	Research On Key Technology In Personalized Search Engine
9	Research And Implementation Of Personalized Search Engine Based On Query Preference
10	With Natural Language Understanding And Information Mining Capabilities, Search Engine Development