Font Size: a A A

Research On The XML Document Information Retrieval Technology

Posted on:2008-02-10Degree:MasterType:Thesis
Country:ChinaCandidate:J ChenFull Text:PDF
GTID:2178360215991527Subject:Computer software and theory
Abstract/Summary:PDF Full Text Request
The resources on the net are increasing rapidly since Internet appeared.The problem of finding and utilizing resources through data retrieving technology has been desiderating to solve.XML technology has more merits than HTML.It can realize the independence of content, structure and representation, and therefore it is more suitable to represent,exchange and store for data.Meanwhile,it is also a kind of semi-structural document,whose structural information can make computers read it and help human understand it.What's more,XML is more suitable to retrieval than HTML on the network. Wherefore, more and more data is described,stored,exchanged and represented with XML.It is gradually becoming a new star on the stage of Web and perhaps XML will become the standard for the representation, storage,integration and exchange of Web data.Nowadays,there are some research findings in XML index,query and storage fields about how to utilize,process,parse and disposal XML data effectively.On the basis of those pertinent researches and the XML peculiarity,thesis researches the realization technology through the aspects of retrieval theory,mathematics model and database realization.Concerning main technology of XML document retrieval model,thesis offers a general frame and main functional modules of XML document information retrieval system based on Chinese and English,and gives a new kind of index technology and a way of fuzzy query.In detail,on the basis of database storage,it can simplize the system realization of parallel process, data resume and transaction processing.Analyzing the faults of two kinds of index technology based on relational database,thesis offers a new kind in order to get best balance of query efficiency and space cost.On the basis of tree embedding matching module and tree inclusion matching module,thesis offers tree extended inclusion matching module that supports fuzzy query.Because of the structural peculiarity,we add the distributing of key word to the compute of the pertinence in improved Vecter Space Model.In the end,thesis designs and constructs a prototype retrieval system based on improved Vecter Space Model.
Keywords/Search Tags:XML, retrieval, index, tree model, matching, pertinence
PDF Full Text Request
Related items