Font Size: a A A

The Design And Implementation Of A Distributed Full-text Retrieval System Based On Solrcloud Platform

Posted on:2016-12-25Degree:MasterType:Thesis
Country:ChinaCandidate:Y D WangFull Text:PDF
GTID:2298330470950829Subject:Software engineering
Abstract/Summary:PDF Full Text Request
With the advent of the era of big data, huge amounts of data flood in people’s daily life,studying and work. How to store and manage the rapid growing data? How to extract usefulinformation from the data correctly? In the face of these severe challenges and the rapiddevelopment of economic, all kinds of government agencies also have higher demand foroperational informatization. The establishment of a special network information platform hasbecome the top priority. From the initial informatization on, government agencies haveaccumulated a lot of structured data and unstructured data. For the Audit Office, it has morereport forms and legal documents of text data. These data need to be recorded, when we have totake audit operations or change regulations. The staff have to refer to these old files. In this casethe query of traditional database or consulting by the staff of Audit Office will affect theefficiency seriously. Thus, we expect full-text search technique can be used to meet the aboveneeds of government agencies. Through full text indexing to all kinds of data in Audit Office,when the staff need certain information, they can retrieve various types of data. Because theaudit work is carried out quarterly, the operation will concentrate in a certain period of time.Based on the above analysis, we decided to build a distributed text retrieval system to meet theneeds of Audit Office. In this paper, we set up Solr server cluster based on Solr and ZooKeeper,constitute SolrCloud platform, and realize the creation, storage and management of index onSolrCloud. Against this background, we build a distributed full-text retrieval system. User canlogin system, and ordinary users can retrieve, preview and download documents internally. Inaddition, the administrator can upload, store and manage these documents, so the purpose ofsharing information within the governmental agencies is achieved.The paper discusses the background and development status of full-text search anddistributed search engine, introduces the structure and characteristics of SolrCloud, and describesthe mechanism of full text retrieval. Next, according to the specific requirements of the AuditOffice, we make a demand analysis on four aspects for distributed full-text retrieval system, andthen make the development environment design, overall design and detailed design for thesystem. The scheme of SolrCloud server clustering, distributed indexing, distributed search aregiven in detailed design. Then we introduce the concrete realization method of the above design,and make performance testing to the creation of index and responsive speed of retrieval. Finally,we have a brief summary of all the work in this paper, and propose the further research content.
Keywords/Search Tags:distributed, full-text, inverted index, SolrCloud, ZooKeeper
PDF Full Text Request
Related items