Font Size: a A A

The Research On The Indexing For RDF

Posted on:2011-02-11Degree:MasterType:Thesis
Country:ChinaCandidate:H SunFull Text:PDF
GTID:2178360305954912Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
RDF is an important way for Semantic Web data management, which is to study an important aspect of Semantic Web.Although researchers have always been great interest and enthusiasm in dealing with the realization of Semantic Web capabilities, the most of the RDF data management strategies are limited to its processing efficiency and large-scale data processing issues. The growing popularity of RDF format is to strive to overcome these shortcomings.From the view of the traditional relational database, many restrictions mainly come from the form of self-storage model of RDF, which is based on triples. Recent studies have shown that the use of column-store support for RDF data management can effectively solve the above problems, and researchers also done some improvement in the column-store. For example, there is a method that building six indexes on the column-store. However, some new problems inevitably produced, such as the problem about storage space and efficiency problems have not been effectively addressed.According to the traditional method for RDF storage, this paper move an amendment to a new RDF storage strategy based on existing research, that is, the ninefold indexes storage structure. the nonuple indexes of data management is the further improvements on the basis of the original triple structure and the column-store data management, and through experimental its logical conclusion has been received. it not only own the characteristics of the column-store for data management and the six indexes structure, the index structure itself also makes query more perfect. At the cost of the certain storage space, nonuple index structure makes query processing efficiency.The storage structure of the nonuple indexes set up the three indexes structures and secondary index structure according to the properties of a collection of triples resolved RDF data set. The third class of indexing build the index structure level accordance with the permutations and combinations of subject, predicate and object, while secondary class of index structure is based on one of the subject, predicate and object, the other two properties side by side as secondary class of index. The key of the secondary class of index structure is that the two attributes in it are parallel and equivalent. When the number of constant property in the triples is one, you need match all the data set to obtain the other two unknown quantity. In this case, all the data in these two indexes meet the conditions, and thus all the information need to be compared. When the data reaches a certain amount, resulting in matching the object can not be quickly achieved the problem due to the excessive cost of resources for the increase of information. The problem can be resolved when adding the second class index. When you specify a constant in the triple, then all the information will be locked under the index of the constant through the two indexes. As the characteristics of the two indexes of data structure, this time we do not need to traverse resources like three indexes. This data structure reduces the query time, to a certain extent the algorithm makes the complexity decrease. Although the storage space on the nonuple indexing has a certain expansion, it is worthy.The advantages of the nonuple index structure are also evident. First, query can own rational choice of index structure according to the characteristics itself. Second, it reduce the cost of IO .Third, it effectively address the multi-value problem. Forth, the nonuple indexing of data management reduce a certain amount of connectivity and merge operations. Last, the nonuple indexing of data management avoid redundant NULL values. These advantages have also been discussed in the article.In addition, this paper also proposed a SPARQL query engine based on this the index for the nonuple indexing of data management, and summarized the query processing algorithm in the engine processes. The one of most important part in query engine is the design of the interface. Because it needs a good interface that the data sets in the analytic, the building of index structure, the analysis for the SPARQL input and how to reasonably match the information set.The first thing to do in query engine is analyzing and classifying the RDF data sets. On this basis, query engine set up the index structure after the initial triples. On the input side, our work is matching the index structure for the query through of the algorithm presented in this paper, we will match the different SPARQL query to the specified index structure. For example, when the triples in the where clause of the SPARQL query contained one property variable, through the analysis of input, the query engine will match the triples into the secondary class indexing structure. When triples the number of variable attributes is two , the query will match to the third class indexing structures. Because of the possibility of a larger amount of data, this paper adopts a B+ tree method for matching process. Due to its own characteristics B+ tree structure has been more widely applied to the search and query. In this paper, the query engine deal with the parsed properties before the application of B + tree. In accordance with the principle of unity all property assigned a unique ID. In the querying attribute, the query engine will search ID number of attribute, which reduces the analytical difficulty, simplifies the query and also compresses storage. After matching information, the query engine also will filter selected sets of information. At the same time through filtering data sets, the final result set will be displayed at last.Dataset of storage strategy is an important aspect for improving the efficiency of the semantic query. The nonuple indexes of data management made further improvements for the original index structure based on the ideas of column-store data management. It set up the index structure according to the SPARQL query language. The query engine implements the idea, and the results of the experiment approve its feasibility and validity.
Keywords/Search Tags:Semantic Web, The storage structure of nonuple indexing, SPARQL, Query Engine
PDF Full Text Request
Related items