Font Size: a A A

The Research And Implementation Of The Massive Documents Full-text Retrieval System Based En NoSQL

Posted on:2016-10-22Degree:MasterType:Thesis
Country:ChinaCandidate:X W HuangFull Text:PDF
GTID:2308330470471094Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
The massive documents full-text retrieval system based on NoSQL is a new documents management system, which combines distributed full-text retrieval and distributed storage. It provides a scheme of upgrading documents management system, which makes documents storage meeting the requirements of the massive storage and makes it accurate and efficient to retrieve documents.The traditional documents management systems always use the external characteristics of documents as the keywords of documents retrieve and establish associations between these keywords and the content of the documents. Then we can save the associations into the relational database and store documents on the file system of operating system. This approach may cause two problems. The first problem is that the documents are stored on a single server, resulting in limited storage capacity and poor expansibility. The second problem is that the searching accuracy is low by external feature of documents. To solve these two problems, in this paper we will put NoSQL and distributed full-text search engine, which are new technologies, into the document management system.MongoDB is very famous in the field of NoSQL and rank in the DB Engines’ top five. It’s also been used in common. In this paper we will analyze the characteristics, function and the suitable environment of MongoDB, and focus on MongoDB’s principle of realizing shard and replication, which provides a good case for the study of distributed storage.ElasticSearch is an open source, distributed and RESTful search engine, which is based on Lucene Although it is a new distributed search engine, it’s much more excellent on the performance, scalability and maturity. We will put much efforts on the source of ElasticSearch and analyze its principle of implementation distributed search.The main content of this paper is the research and implementation of the Massive documents full-text retrieval system based on NoSQL. According to the functional needs, we will design the system architecture based on module plug-in components structure of OSGi and divide service function into four main modules including document processing, text extraction, text indexing and query. The using of plug-in structure ensure system scalable and supportive to new documents format. The problem of massive documents storage is solved by MongoDB. The problem of accuracy and efficiency of full-text retrieval is solved by applying ElasticSearch.
Keywords/Search Tags:NoSQL, MongoDB, ElasticSearch, Distributed, Search engines
PDF Full Text Request
Related items