Font Size: a A A

Based On The LCA Parallel XML Query Keywords

Posted on:2015-01-31Degree:MasterType:Thesis
Country:ChinaCandidate:X W RenFull Text:PDF
GTID:2268330431457567Subject:Computer software and theory
Abstract/Summary:PDF Full Text Request
With the development of the Internet,the information of explosivegrowth,And the use of the web pages on a global scale make semi-structured data is becoming more and more common and more and more important.we can send and receive information from all over the worldthrough the Internet; however, information interaction process encountered a problem:different platforms may use a variety of data format,which is the data format of heterogeneity problem. Traditional relational database process stretched when dealing with these problems, and XML is developed in this case,The emergence of XML solution provides a theoretical and technical supportto this problem. With XML as the standard of semi-structured, it has been widely applied in the Web of data exchange, data storage, various configuration files of the online bookstores, and e-commerce and other fields. XML technology plays a more and more important role in the IT environment, and has gradually become the fact standards that transfer and exchange of information on the Internet.Due to the advent of the era of big data, XML data geometric growth, how to quickly get in huge data needed information becomes very important.More and more people in the research of XML query mode, higher efficiency, more throughput of XML query methods research appears increasingly important and urgent, parallel query way into people’s horizons.XML query is divided into two kinds, one kind is based on graph model of the query, another kind is based on query tree model.The query based on tree model is nowadays the most studied a query. Tree model is the basis of LCA, namely the the most compact fragments.There are all kinds of query method based on LCA, based on the query result set, based on the semantic query and so on.Through new algorithms or improve existing algorithm to make the query results more, faster, more in line with the user’s intention. SLCA or VLCA query methods are is conducted on semantic analysis, make the query results more complete, more accord with intention.XKeyword and IL algorithm are improved algorithm make the query faster.With the development of the hardware, the development of the popularity of multi-core CPU and GPU, using parallel computing to improve query efficiency has been the attention of people.Parallel technology was used to optimize query is refers to through the powerful hardware to support XML parallel query, the method of research are rarely seen, therefore, of great research value and development prospects.The latest research direction parallel with XML database query, or different XML fragment is stored in a distributed network, parallel processing these fragments.And this paper is to analyze the LCA itself has the nature of the document tree, two groups of parallel concept is put forward. With the rapid development of GPU technology, especially the GPU general-purpose computing (GPGPU) was proposed and used, GPU for its efficient feature is playing a powerful computing capacity of high-performance computing. Therefore, based on GPU parallel optimization technology has also gradually become a research hotspot. Based on GPU parallel XML query keywords also went into people’s field of vision.In view of the above two points, this paper combined the technology of XML query and GPU parallel computing, and improve the efficiency of XML query.Put forward the improvement, which can be parallel algorithm of LCA.In order to realize the algorithm, First, scince the uniqueness of the XML document’s structure, we need code each node in XML document, which can not be only a unique identifier for each node, but also shows the structural relationship between nodes. Therefore, the paper selected Dewey encoding XML documents. In additional, we use it to complete some simple operations between the nodes. Second, with the same node in the XML document tree coding, using B+tree index, encoded string comparison, stored in ascending order, soweneed a suitable container to store theindex. Considering the embedded database can make inverted index and link seamlessly with the application process, we use embedded database Berkeley DB to make it come true, which makes the index with the application running on the same address space, eliminating the customer machine server configuration related overhead, and the application does not require prior connection with the database service to establish the network, but embedded in the program through the Berkeley DB libraries in the data to complete the save, query, modify, and delete files. In this way, we can ignore during the experiment time to obtain index, which weakened the index on the negative impact of the main experiment. In algorithm, first of all, based on the characteristics of XML document tree, this paper proposes a new method, the same key code column scanning, improve the efficiency of the query LCA method. Second, by the characteristics of the XML document tree, and puts forward two parallel strategy of two groups, and prove the feasibility of this strategy.In order to prove the feasibility of the algorithm, this paper mainly from the query speed ratio and the query time two aspects has carried on the contrast experiment.Experimental data show that our parallel model than the XML query model based on CPU serial manner has better speed ratio and higher throughput.
Keywords/Search Tags:LCAquery, GPU computing, Parallel optimization, CUDA
PDF Full Text Request
Related items