Font Size: a A A

A New Method Of XML Similarity Measurement And Search Technology Based On Vector Space

Posted on:2008-04-28Degree:MasterType:Thesis
Country:ChinaCandidate:Y L MaFull Text:PDF
GTID:2178360212492858Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
With the rapid development of the Internet, the semi-structured XML as a markup language have a valid expression of all kinds of information, data and the ability to enable applications to work together. So, XML is already the factual standard of data representation and data exchange over the web. XML as a new data model has become the hotspot. In the application of XML, XML data query technology is an important aspect of XML technology. Calculating the similarity between XML documents is the foundation of the XML document analysis and management.In XML query technology, the precise matching of XML search technology has been very mature and has become widely used text search. And it has already been proven to be a very good XML document retrieval technology. However, the imprecise matching search of the XML documents is still in the initial stages. So there are many problems, such as the efficiency of XML Search, the accuracy of XML search and the integrality of XML documents. A mass of studies have proven to XML documents keyword proximity search, one of the inaccurate matching search technology, is very suitable for labeled trees structure of XML documents. According to the thinking of keyword proximity search, we propose a new XML similarity measurement method in this paper. And we design a search algorithm based on using this XML similarity measurement method.First, with the traditional XML search algorithm the same, we express XML documents into XML trees. At the same time, we assign weights to the branch of the XML path hierarchy. Second, XML path hierarchy is mapped to a vector, and XML document set is mapped to a matrix space. This can simplify XML document similarity calculation algorithm to improve search efficiency. Then, through the matrix transformation, the vector space is reduced and the search space relative to the vector space is also reduced. In this way,we reduce the search space and improve the efficiency of the keyword proximity technique. Finally, we test the technique, search the result and give a summary.
Keywords/Search Tags:XML, XML Query Technology, Keyword Searching Technology, Proximity Search, Vector Space
PDF Full Text Request
Related items