Font Size: a A A

The Implementation And Optimization Of Large Scale Enterprise Search Engine

Posted on:2016-04-27Degree:MasterType:Thesis
Country:ChinaCandidate:M XuanFull Text:PDF
GTID:2308330479482181Subject:Software engineering
Abstract/Summary:PDF Full Text Request
When an enterprise has a lot of documentations, they often use search engines for their information retrieval purpose. Traditional enterprise’s search engine solution is generally through purchasing commercial search engines. However, the documentation is different from the web page, they are much larger than a web page. When dealing with such a large amount of data, traditional general-purpose search engine usually got a poor performance.This project provides a solution to build an enterprise search engine, which can handle large scale of data and achieve the state-of-art. The core is the use of the open source distributed search engine "ElasticSearch", and optimise its configuration according to retrieve these documents on demand. The optimization including: redesign the index schema; adapt indexing storage strategy and so on. These approaches have decreased the size of the index by 55%, shortened the index rebuild time by 50%, and increased the index query response speed lines by 45% without removing the highlight functionality.This project has also developed a number of search engines core functions, which can provide a better search experience for users, including: spelling correction function, search query recommendation and personalised search results.Spell Correction module is used to address the misspellings of Chinese words, misspellings of English word, spelling errors of camel string, and supports user-defined dictionary to deal with common spelling errors. The module is mainly based on the distance(including edit distance, Pinyin distance, character string distance, etc.) to generate the correct candidate words. Then use learning to rank to select the best corrective results. The module got a score of 86% based on the best N measurement on a selected test data set.Query Recommendation module dedicated to helping customers optimise queries. This module uses the query logs and document corpus to generate recommendations. The module also uses the Learning to Rank method to select the top 10 best recommendations as a result of the query recommendation. Using selected test data and label the ground truth by human, the module obtained a 100% coverage and more than 90%(92.6%) accuracy.Personalise the search result is committed to providing the best search results to a determined user. This module uses a model based on user model generated by a topic model training progress and user history click record. Also,we have design a method to integrate this functionality with Elasticsearch friendly. On the labelled data sets, this module allows the average rank of the user-clicked document on the page rose 5(from 11 to 6).Research on this topic is providing a viable solution of the scene when build and optimise enterprise search engine. Open source search engine, although currently reached a level out of the box, but the auxiliary search function tuning and search results optimization is still a lack of more complete implementation. This project is committed in a real scenario, developed to deal with huge amounts of data, to build high performance and user-friendly search engine. The project is trying to complete the implementation of the program and record the results of comparative tests, in order to enrich the relevant documentation in the field.
Keywords/Search Tags:Search Engine, ElasticSearch, Learning to Rank, Spell Check, Query Recommendation, Personalize Search
PDF Full Text Request
Related items