Font Size: a A A

Research And Application Of Management Of Unstructured Data Based On MongoDB

Posted on:2018-09-22Degree:MasterType:Thesis
Country:ChinaCandidate:Q YangFull Text:PDF
GTID:2348330512989016Subject:Software engineering
Abstract/Summary:PDF Full Text Request
With the development of the Big Data Technology,The mainstream of Internet data has gradually changed from structured data to unstructured data.However,various types,the huge scale and the difficulty of normalization make the unstructured data management hard.Thus,using Big Data technology to manage unstructured data efficiently is an extremely popular research direction.And this is what this thesis is about to research.The research of this thesis mainly consists of three parts: MongoDB,the management of unstructured data and the application of MongoDB on the management of unstructured data.Design and implementation of unstructured data management system based on MongoDB and other big data technologies is a upgrade project of Comsys company's shared data center and is the embodiment of application of this thesis.First of all,this thesis introduces the background,the source,the research content,the contribution of the topic and the present situation of the NoSQL database research.Then it summarizes the research direction of the unstructured data management,and introduces the current research situation in each direction.Based on it,through the author's summary and analysis,the overall architecture of the management of unstructured data is introduced and the key technology of each module is researched deeply.Then,this thesis introduces the research and design of the system.By studying the characteristics of MongoDB,HDFS and relational database,a solution that MongoDB storage metadata and small files and HDFS storage big files is determined.And the next,by analyzing the practical application of the data collection module in the production environment,the data collection coordinator based on Zookeeper and load weight is designed.By studying the deficiency of MongoDB full-text index,this thesis designs the query function which is based on MongoDB+ElasticSearch and has data writing and data reading separation,and proposes a paging optimization method of MongoDB.Then,by studying the information isolated islands and information integration,the application service interface is designed.Based on the analysis of the problem of query and computation in large data,the statistical analysis function which is based on MongoDB and has query and computation separation is designed.By studying automaticpartitioning and replica set of MongoDB,a system which has scalability and high availability is designed,and this thesis proposes a MongoDB data block migration method based on operation frequency.Finally,this thesis describes the implementation of the system in detail.In accordance with the performance of the system design,the physical deployment of MongoDB is done.This thesis combines HttpClient and poi to implement active collection function,and provides a method to judge the file type according to the specific byte identification of the file stream.In addition,by studying the user experience degree of data collection,a data collection optimization method based on cache is proposed.This thesis implements query function based on MongoDB+Elasticsearch,and proposes a relevance scoring method based on the es original scoring function and the heat of degree of documents.This thesis implements service interface based on WebService and file classification based on MapReduce+Kmeans and the heat of degree of documents.
Keywords/Search Tags:MongoDB, Unstructured Data, Big Data Technology
PDF Full Text Request
Related items