Font Size: a A A

The Design And Implementation Of Site Search Engine Based On The Inverted Index And The Trie

Posted on:2017-04-27Degree:MasterType:Thesis
Country:ChinaCandidate:F Y SunFull Text:PDF
GTID:2308330509457583Subject:Software engineering
Abstract/Summary:PDF Full Text Request
With the continuous development of the Internet and fast-paced life, people pursue a better user experience, while long time waiting for search become intolerable. Plenty of mobile apps and websites share a painful concern how to obtain a faster search process, better search results and recommendations that better match users’ thoughts. The project is intended to provide a better user experience and a better search experience by building an inverted index to speed up the search phase, and building a digital tree to quickly find pinyin association word. This project implements a lightweight search engine, that can be easily transferred to a different system which leading a more efficient and low-cost software development.This system consists of two parts: the first part is the full-text search indexing engine, mainly, which is in charge of the establishment of inverted index from the data source, saving the index in an effective format, incremental update, index compression, search results sorting, and automatic summarization; the second part is the intelligent search engines, which mainly complete the keyword search, fuzzy search, phonetic association and other functions. In addition, this project also implements the Web and data source API. The main content of the work is: independently throughout the site search engine requirements analysis, systems design, system implementation, and testing. The system consists of 8 core modules, two auxiliary modules includes(1) Document data source acquisition,(2) inverted index establishment and compression,(3) inverted index update,(4) inverted index lookups,(5) Search Sort,(6) Pinyin conversion function im plementation,(7) Tire establish phonetic search, find(8) Pinyin word association, as well as auxiliary functions such as highlighting, and other functions related recommendation.When the system design is extremely concerned about scalability, portabilit y and practicality. System uses disk-based sort merge algorithm that can sort the data memory can not hold, which enhances system’s availability; by using cidx Hit algorithm to compress the inverted index, system takes up less memory without affecting the efficiency; by using BM25 algorithm to calculate correlation score, system obtain more flexibility by giving different weight to diverse domain; based on phonetic word generator Information generation requires fast feedback to the user characteristics, choose the dictionary as a tree data structure, timely and effective feedback to the user and enhance the user experience of the system.Final internal search response time is about 0.02 s, phonetic association response time of 2ms, effectively guarantee the availability and usability of the system.The system mentioned in this paper is applied on Baidu APIStore official website(http://apistore.baidu.com/).
Keywords/Search Tags:Web Search, Inverted Index, Pinyin Association, Trie
PDF Full Text Request
Related items