Font Size: a A A

Design And Implementation Of Distributed Storage Optimization For College Entrance Examination Data

Posted on:2018-02-08Degree:MasterType:Thesis
Country:ChinaCandidate:F P YuFull Text:PDF
GTID:2358330518468362Subject:Engineering
Abstract/Summary:PDF Full Text Request
Recently,the rapid development of information industry in all walks of life has produced a data explosion in many fields,absolutely,including the field of college entrance examination.As we all know,in every year,college entrance examination will produce the massive data,thus how to quickly and efficiently storage these massive college entrance examination data becomes an important issue that is worth studying.Faced with TB level or even PB level of massive data,traditional relational database turns to be weak in its storage capacity.Large-scale data promote to produce a lot of technology to storage these data.Among them,GFS from Google and HDFS from Apache,more popular,are two typical large data distributed storage technology.HDFS allows enterprises to use a large amount of cheap machines to make the distributed storage for massive data,but its storage method which is to use a master node to control other multiple data nodes is easy to cause the bottleneck problem.For the college entrance examination data in this dissertation,when a quantity of students query examination results,there will be a lot of queries flooding into the main node of HDFS,which will be a great challenge to its master.For the problems above,by making the deep study and analyzing on HDFS operating on Spark cluster,this dissertation proposes a distributed storage scheme of HDFS + MongoDB to solve the bottleneck problem of HDFS master node in order to make the distributed storage for college entrance examination data more optimized and make the query of results more efficient.Based on above analysis,the main research work in this paper is as follows:(1)Firstly,this dissertation introduces the background and significance of topic and expounds the development status of distributed storage technology,college entrance examination information and Spark large data platform technology.(2)Secondly,it puts up an optimization scheme of HDFS + MongoDB distributed storage for college entrance examination data by analyzing the main node bottleneck problem of HDFS.(3)Thirdly,according to the specific requirements of the Entrance Examination Institute,it makes a detailed demand analysis from the perspective of users,functionality and non-functionality,etc.,and it also designs the overall structure of system,system function,system database and HDFS + MongoDB distributed storage.(4)Fourthly,based on the detailed design of the system,it gives the concrete method to put the plan into practice.It tests the function of system by black-box testing and tests the performance of system from three aspects: response time,throughput and concurrency,whose test results are consistent with the original target.Finally,it sums up the main content of this dissertation and points out the direction of the following effort.
Keywords/Search Tags:HDFS, Distributed Storage, College Entrance Examination Results Data, Spark, MongoDB
PDF Full Text Request
Related items