Font Size: a A A

Research And Implementation Of Keyword Index For XML Document In Relation-XML Dual Engine Database Management System CoSQLRX

Posted on:2011-12-12Degree:MasterType:Thesis
Country:ChinaCandidate:H D YuFull Text:PDF
GTID:2248330395957963Subject:Computer system architecture
Abstract/Summary:PDF Full Text Request
With the accelerating pace of development for informatization, the effort which XML technology plays is more and more important. Nowadays, in the fields of Electronic Commerce, medical treatment, telecommunication, the press and publishing, XML has been playing a wide role in data presentation, data switching, modeling and analysis. While on the other hand, when people use search engines or query function of database, they are used to checking with keyword, but not the complicated syntax or pattern information. The function of keyword search can simply the query in database, and supply much convenience to user. There is no need for users to master the complicated syntax of searching on XML data, which offer a higher efficiency for query and integrity.The traditional relation database has developed in a mature conditure. The keyword query based on the relation database can supply convenient service to people, and it has a good prospect. On the other hand, XML which has been a criterion in representation of data is full of energy. It is of great importance to unify relation and XML, and to fulfill the need of keyword retrieval. This thesis based on the system CoSQLRX that is developed by Peking University, Northeastern University, and the corporation of Honeston designs a kind of keyword index, reference relevant technology of PostgreSQL, supplies the system CoSQLRX with the function of keyword search. Firstly, the feature of CoSQLRX system is set, including its architecture and support to XML, Based on which the design of the keyword index on XML data in this system is presented. Using structure index as its basic, the system syncopates XML document node records coding information, and also build the relationship between text and document structure. Considering the effect of text information to query, the number of words with their positions is record in index items, which inplements the function of query, and support the computing of result important level.To complete our desigin, the system will use toast technology. It inserts the structure index into XML document in that way. The inverted index shows itself in the entry tree, While leaf node can exist in the list, B+tree form. When insertion is made, the number of items can adapt to the number of data inserted into the document. In the query facet, this thesis considers the characters of single word and multiple words querying. Especially for phrase query, it will distinguish different items with the word positions information which is stored in the system design.Finally, the content of this thesis is tested and verified by experiment. The keyword index for XML document will be tested by XMark system. It reveals the stabilization of the index on time and space consumption. The index can be extended in a benign way.
Keywords/Search Tags:XML, keyword retrieval, inverted index
PDF Full Text Request
Related items