Font Size: a A A

Research On The Optimization Of Network Community Data Storage And Retrieval Technology

Posted on:2017-08-27Degree:MasterType:Thesis
Country:ChinaCandidate:J GuoFull Text:PDF
GTID:2348330518995970Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
In the network community,the data scale is big,the data type is various and the data structure is complex.Analyzing the content of the post in the network community can timely,accurately grasp the hot topics concerned by the grass-roots people,and how to effectively organize,store and retrieve the real-time massive data in the network community is the key and difficult point to analyze the hot topics.The problem of low efficiency of storing large file in the network community,the "hot spot"problem encountered in the process of storing large-scale high-efficiency post in the network community and the problem of low efficiency of searching large-scale data in network community.In order to solve the problem of low efficiency of storing large file in the network community,in this paper,by reasonably setting two capacity threshold values,storing files classified according to the size of the data capacity,a improved hybrid storage mechanism based on HBase and HDFS is proposed.For the "hot spot" problem during the data storage in network community,in this paper,based on the characteristics of the large-scale data in the network community,the HBase table is divided into several regions when created,and through the optimized design of Rowkey,a design scheme of pre-partition and hashing is proposed.For the problem of low efficiency of searching large-scale data in network community,by integrating Solr into HBase,to create the full-text index of large-scale data in the network community,this paper proposes a design scheme of optimized retrieval based on Solr.The experimental results show that the improved hybrid storage mechanism based on HBase and HDFS can effectively optimize the efficiency of large file storage in network community.The design of pre-partition and hashing can effectively store large-scale high-efficiency post in the network community to each Region of the HBase table,solving the "hot spot" problem and achieving load balancing.The design of optimized retrieval based on Solr significantly shortens the time to retrieve data according to the non primary key field,effectively optimizes the performance of large-scale data retrieval in the network community.Finally,the network community data storage and retrieval technology has been optimized.
Keywords/Search Tags:network community, classified storage, pre-partitioning, hash, full-text index
PDF Full Text Request
Related items