Font Size: a A A

Based On The Attribute Table Of Rdf Data Storage System Research

Posted on:2014-02-15Degree:MasterType:Thesis
Country:ChinaCandidate:C K TaoFull Text:PDF
GTID:2248330395495514Subject:Computer technology
Abstract/Summary:PDF Full Text Request
Semantic web is an extension to current World Wide Web. By including semantic content in web pages, machines can understand and process the knowledge inside. RDF is a knowledge representation model used in semantic web. It is widely adopted as semantic web develops. With the increasing amount of RDF data, efficient storage of large scale RDF data becomes challaging. This paper investigates the storage technologies of RDF property table, especially property selection algorithms and dynamic adjustment algorithms.Previous researches indicate that different RDF datasets and workloads require different storage strategies. No existing RDF storage method can perform well in all scenarios. Property table are considered promising because they can change property table schemas according to workloads. A property selection algorithm must be assigned when using property table. Most existing methods are applying algorithms designed for other fields, such as Apriori algorithm used in data mining, vertical partitioning algorithm used in distributed database systems, etc. This paper proposes a new property selection algorithm designated for RDF data. The new algorithm can not only select property based on workloads, but also reduces the join operations.On the basis of property selection algorithm, this paper designs property table adjustment strategies that can make use of the latest query information while the system is running. Because adjustment operations of property tables often have high costs, most existing algorithms create the property tables offline. This paper proposes an algorithm that can judge the system’s load level. The algorithm uses the idea of PID controller. By measuring the requests and responses, it can qualitatively tell whether the system is idle or not. Moreover, this paper proposes an incremental schema adjustment algorithm. The change is made at the attribute level and only when the system is idle. In this way the impact of property table dynamic adjustment can be reduced.Finally, this paper adds the property table stores functionality to Jena, an open source semantic toolkit. The query processing module in Jena is modified to redirect possible data accesses to property tables. To evaluate the performance of property table stores in real application environments, this paper also adds user number simulation to SP2Bench SPARQL benchmark tool. Experiments on the modified Jena and SP2Bench show that the new property table property selection algorithm and adjustment timing algorithm can significantly improve query performance.
Keywords/Search Tags:RDF data, property table, storage system, PID controller, Jena
PDF Full Text Request
Related items