Font Size: a A A

Research Of Similarity In XML Documents And Its Application In Software Component Clustering

Posted on:2009-02-26Degree:MasterType:Thesis
Country:ChinaCandidate:H S LiuFull Text:PDF
GTID:2178360245999995Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
With the popularization and further development of the component based software development, component repository has been received more and more attention from the software research field. Component retrieval is the basic function of component repository, clustering components using clustering technology can realize automatic classification and improve the efficiency of component retrieval. Component can be described by XML documents, and description document can be used as the logo of component. The component cluster can be converted into XML documents cluster. Therefore, the research of similarity in XML documents and component clustering is important and valuable.XML document is a synthesis of structure information and semantic information, so the similarity research of XML documents needs to combine its sematic and structural features. First, the problem that non-leaf node repeated in XML document makes compute similarity complex, this paper streamlined the structure of XML document through repeating and nesting reduction of non-leaf node. Then, through the analysis of structural characteristics of XML document, this paper expanded the previous recursive structure of the model, used orderly nesting elements to reflect the XML document structure, considered label,weight of label,the content of leaf node and its weight, used WordNet and SD to calculate the semantic information of label, and gave the weight of label and leaf-node according to its structural characteristics; Finally, this paper provided the gradational recursive algorithm of XML document.This paper adopted a universal programme of Facet description to describe component and used XML as the Component Description Markup Language. This paper acquired similarity matrix from the gradational recursive algorithm, then implemented document cluster using hierarchical cluster algorithm. Through experimenting on a testing system of component repository, it was showed that the gradational recursive algorithm calculation method could effectively calculate the similarity of XML documents and could meet the requirement of the software component clustering.
Keywords/Search Tags:XML document, Similarity, Gradational recursive algorithm, Clustering algorithm, Software component
PDF Full Text Request
Related items