Construction Of The Distributed Data Register Center And Preliminary Implementation Of Retrieval Function Based On The DOA

Posted on:2015-04-23

Degree:Master

Type:Thesis

Country:China

Candidate:M J Du

Full Text:PDF

GTID:2298330467966174

Subject:Computer application technology

Abstract/Summary:

With the coming of the age of Big Dataï¼ŒPeople from all works of live all overthe world are troubling by issues about management,processing and querying massivedata.The proposition of DOA can well solve these problems.The data registryï¼Œas thekernel of DOA,is the top priority of the entire architecture which stores flood ofmeta-data that grows rapidly. The data registry based on traditional database isobviously hard to meet the requirements. For this point, this thesis use NoSQLdatabase to complete the design of distributed data registry.There are plenty of data in distributed data registry which possess samefunctions.That is to say,these metadata are sharing same key words.Consequently,themoment that users search a certain key word,they will obtain a large number of resultsreturned,which are difficult to choose quickly.In this regard,this thesis introduced theconception of data quality.Meanwhile,a new sort algorithm based on data quality fordistributed data registry come into being,which implements function of quick sortingand selection in mass data on the MapReduce processing framework.In this paper, firstly I illustrate the mainly functions of the data registry.Secondly,it introduces the key technologies. Such as Hadoop,MapReduce,NoSQL,Cassandraand so on.Based on the abundant investigation and detailed analysis of the circuitrequirements,the outline design and details design was completed.This thesis completed contents as below:1.Have realized the distributed storage of the metadata through the use of localCassandra database to finish the design of distributed data registry.2.Have realized metadata retrieval through the key word so that the sortingresults of metadata could be returned quickly by inverted index column families ofkey word which stores comprehensive scores of data quality.3.Have implemented an offline sort algorithm based on data quality weightedsum by MapReduce framework for parallel computations.Using this algorithm theinverted index column families of key words can be updated periodically,namely, re-sorting.The main contents and innovations of this paper could be summarized as follows:1.Proposed a new method for distributed data registry when dealing with massivedata.the The registry distributed storage of the metadata through the use of localCassandra database overcome problems in traditional registry.Such as slow queries ofbig data,lack of data backup,the limit of the unexpansibility and so on.2.Presented a new design model of data registry based on the relationshipdatabase.These following points are composed in this respect,refactoring the structureof metadata storage based on rhe design idea of data query-Oriented.Each metadatawill be stored in four column families which are metadata described columnfamilies,categorized column families,key word inverted index column families anddefault weights column families respectively.3. Put forward the concrete design idea to introduce the notion of data quality todata registry which defines the seven standards stored in the form of metadata columnin metadata described column families, which can be quantified and used to calculatethe data quality.4. Proposed sort algorithm based upon data quality of metadata.combiningMapReduce framework for parallel computations with personal features ofCassandra,I designed a sort algorithm which can be take as the basis for selection byusing the sorted results of weighted sum based on data quality,so that it brings greatconvenience for users to retrieve rapidly.

Keywords/Search Tags:

Data Register Center, Cassandra, MapReduce, big data

Related items

1	Research A Model Of The Metadata Hierarchical Storage In The Distributed Data Register Center Based On The DOA
2	Cassandra Data Management For MMO Games And The Implementation Of Integrated Website
3	Design And Implementation Of Historical Data Analysis System For Hazardous Chemicals Transportation Based On CASSANDRA Database
4	Research On Mapreduce-oriented Data Center Networks With Recursively Hierarchical Structures In The Integrated Information Infrastructure
5	Research And Implementation Of Subject-Oriented Structured Data Integration On Multiple Web Sources
6	The Visualization Research Of Data Register Center In DOA
7	Discussion And Preliminary Application Of Data-oriented Software Engineering Method
8	Partition Storage In The Data Register Center
9	MapReduce Job Oriented Collaborative Optimization On Cloud Data Center Network Resource
10	The Research And Implementation Of Memory Management Mechanism For Cassandra