A Distributed Graph Storage And Query System For Web Data Management

Posted on:2010-10-26

Degree:Master

Type:Thesis

Country:China

Candidate:D Tao

Full Text:PDF

GTID:2178360275491626

Subject:Computer software and theory

Abstract/Summary:

PDF Full Text Request

With the dramatic development of WWW(World Wide Web),web data witness a explosive boom up in both quantity and scale,which makes web data forming the hugest database in the world.Moreover, other data associated with web data,such as search engine records and click records on web service,is also growing rapidly.Compared to traditional data,web data is semi-structure data and has the character of high increase rate,variety data type.Therefore,it is unlikely to deal with traditional data and web data in the same way.Nowadays,there is a large demand of web data analyses technology in all fields,which has been attracting increasingly attention in relevant area of database research.Therefore,we introduce in CWI,a new query and analyze tool for massive data.In realistic application,we need to store and query large scale of data,implement keyword searching and query on data structure.As parts of CWI,TLGM and TLGM-QL meet these demands. We emphasize on implementing TLGM data model in the distributed environment,and we design and implement four basis operators supporting TLGM-QL.During the designing phase,we find that the disproportional spacing real world data will induce to degeneration of the store and query algorithms,which increase the time cost.In order to solve this problem,we bring forward a series of algorithms to keep the difference between data nodes' storage and calculation load in a bearable scope.On this base,we bring a new reconstructing algorithm for subgraph to support query on graph structure.We also propose several balancing methods to ensure the efficient of algorithm above. We design and run experiments on virtual and real world data to prove the system's efficiency.Our major contributions of this thesis include:1.Web data's character is analyzed,and TLGM model is introduced to illustrate the difference between web data and traditional data on storage,querying and indexing.Firstly,we try using relational database to store graph data,designing several queries and making experiment on it.By checking the experiment results,we show the limitation of centralized storage.2.We analyze the TLGM model and illuminate its implementation under distributed environment.Furthermore,we summarize the query language supported by this model,and propose four basis operators.We use some examples to prove these operators have good flexibility,and then we provide the pseudo code of them.3.A novel algorithm of subgraph reconstruction is proposed,which is used to support query on graph structure.We implement this algorithm in the MapReduce framework,which makes it having good scalability.In addition,we use cache strategy to improve the efficiency.We also make some improvement on balancing methods for real word data often cause imbalance load between different data nodes.Extensive experiments are performed to verify the efficiency of our algorithmWe believe our work is a good example of web data storage and querying with practice since we not only provide some key solutions for storing web data as graph,but also implement a novel framework to index and query massive web data.Our work has great importance in web data storing area.

Keywords/Search Tags:

TLGM data model, distributed storage, Web data management

PDF Full Text Request

Related items

1	Research On Distributed Storage Management Method For Manufacturing Big Data
2	The Research On Large-Scale Distributed Storage Technology
3	Optimization Algorithm For Data Reconstruction In Distributed Storage Systems
4	Research On Geographical Space-Time Big Data Management System Based On Distributed Storage
5	Storage Of Distributed Data Management In Wireless Sensor Networks
6	The Study On The Key Technology Of Distributed Data Storage
7	Research On Key Techniques Of Distributed Data Processing And Storage
8	Research On Data Management Technology Of Distributed Storage System
9	Design And Implementation Of Business Data Storage And Forwarding Management System
10	Research A Model Of The Metadata Hierarchical Storage In The Distributed Data Register Center Based On The DOA