Font Size: a A A

The Study Of Keyword Search In XML And Its Implementation In Native XML Database

Posted on:2008-08-21Degree:MasterType:Thesis
Country:ChinaCandidate:H DongFull Text:PDF
GTID:2178360212995893Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
Nowadays, people can release and obtain the information easily on the Web, the Web has already become the main terrace for ma Use an inspectional technique of traditional information. Along with the continuously enlarging of the Web information, the efficiency of searching the HTML page with the traditional technique of the information search is lowly and not correct. The emergence of the XML, resolved the problem to some extent, and become a hot problem of the current research gradually. Currently, many universities of the abroad, the research factories, the big database factories all have the item study to the XML technology, but the keyword search of the XML data is still rare to be involved or has little result announce; On the other hand, the Native XML Database all adopt the technique of the XML of the kit directly from the database core layer to its search languages, design the saving model and search realization aimed at the XML exclusively, suit the management of the XML data more, but now the native XML database has still some bug, the research of the native XML database has the important application value.The main contents in the text: Carried on a thorough research to the keyword search of the XML data, resolved the lowest and public ancestry problem in the XML key word index using the method based on the minimum search the scope, putting forward the calculate way NLCA of the keyword index in the XML; Carried on a thorough research to the technique of the native XML database, analyze the saying technique of the native XML database, set up the system structure, and design the saving system of the native XML database.Carried out the NLCA calculate way in native XML database, through the experiment test, analyze the saving function of the native XML database and the inspectional efficiency of the NLCA calculate way.The traditional information index is mostly based on the HTML method, the search taking the text file as a grain degree, the returning result is the text file that includes a certain key word, but the XML data carry out the keyword index taking chemical element as the grain degree, not need return the whole text file, only return the text file part including a keyword, raised the search speed. Compared to the search language with XML text file such as X Query, X path, XQL etc. compares, the main advantage of thekeyword inspecting technical of the XML is the customer doesn't need to study complicated search language, not need to have thorough understanding to the data structure of the XML text file on first floor, the customer only needs to input the key word related to his interested in contents.The XML key word index is simple and practical, because it is not needed to understand the model of the XML.Because of the tree structure characteristics of the XML, the key word index on the XML request us to return the most related result to customer, usually is the minimum son tree that includes a key word. This problem can convert into the lowest and public ancestry problem of the classic. The former XML keyword index such as: the XRank and XKsearch, when solving a minimum and public ancestry crunode (LCA) of arbitrarily two crunodes, are all based on the Dewey coding, the so-called Dewey coding means the coding of each crunode takes the coding of father's crunode as the prefixion, this LCA crunodes of the two crunodes is the crunode their longest public prefixion denoted. The advantage of using this kind of codes is for the settled two crunodes, we can solve a LCA only to compare their codings , but we should see that along with the depth of the crunode increasing, on the one hand it wasted more time at solving the longest public prefixion, on the other hand it wasted the huge space to save such coding.The text put forward one search way based on the search of the scope of method to resolve the lowest and public ancestry problem in the XML key word index. The improved technique NLCA of the keyword index is to apply the solveing way LCA based on the RMQ (the Range Minimum Query) in the XML keyword index, apply the LCA calculate way based on the RMQ in the XML text file, get the calculate way to solving arbitrarily two LCA crunodes on the XML, set up the index calculate way of the XML keyword NLCA on these grounds, but this calculate way want to wait for all NLCA crunodes on the whole after coming out then can clean among them of ancestry crunode, but the non-blocked calculate way can resolve this problem, furtherly, we apply this method in the next non- blocked calculate way calculating the lowest and public ancestry, put forward the non-blocked calculate way LCA base on the RMQ, and give the preceding batch of its time complexity, which is to suppose that the XML chemical element number is n, through the pretreatment of O (n), we can attain the time efficiency O (1) of solving the LCA, The calculate way relatively consumes Central plains of core operation,such as lca, descendant,, pre of thecomparison wait already from dewey the coding relatively became to check watch with in brief in round number of comparison, can speed up consumedly.The Native XML Database all adopt the technique of the XML of the kit directly from the database core layer to its search languages, design the saving model and search realization aimed at the XML exclusively, suit the management of the XML data more. So the text design the native XML database, save and get the XML data using the reasonable saving model, in the meantime, increased the keyword index function for it, this function mainly depends on the index calculate way of the keyword that this text puts forward to carry out.The structure of the system is open, because we can increase new index function supporting a different search technique very conveniently on it. And, the system is also the management system of the XML data that has abundant function characteristic. It can conduct a XML database of single machine version, in the meantime, it provides the API function which makes it be a built-in system to exist again, it is a database chain to connect in the procedure,The text did several sets of experiments to carry on the test on the computer equipped with the Intel 2.80 GHz of the Pentium4 processor and the 512 MB memory. The experiment expressed that compared to the traditional relation type database, the native XML database designed in the text had raised the saving efficiency greatly; applied the NLCA calculate way of the improvement in the Native XML database system, through the effective pretreatment, we can get the expunction of the time coefficient d of solving the common ancestor with the Dewey coding in the keyword index on the XML. In the meantime because the oula sequence of the only saving crunode isn't the Dewey codeing, reducing the expense of the saving spaces consumedly, can raise the speed and accuracy of the keyword index availably, compared to the other inspect calculate ways, the keyword inspect calculate way the text puts forward is more good at searching keyword in a great deal of Web information.With the through research to the XML keyword search, The text put forward the NLCA calculate way; with the research to the native XML saving technique, designed the native XML database, and carried out the NLCA calculate way in the native XML database, the experiment data expressed, the NLCA calculate way the text put forward has certain applied foreground in the keyword search realm.
Keywords/Search Tags:Implementation
PDF Full Text Request
Related items