Font Size: a A A

Clustering Research Of Semi Structured Data And Its Application In Product Design

Posted on:2016-12-22Degree:MasterType:Thesis
Country:ChinaCandidate:B ChenFull Text:PDF
GTID:2348330488473338Subject:Engineering
Abstract/Summary:PDF Full Text Request
With the rapid development of computer network and database technology, the accumula t io n of semi structured data and information in various fields has increased dramatically. It is an urgent need for knowledge discovery method, and the emergence and application of data mining, which provides a powerful tool for this purpose. Data mining algorithm can extract information from a large number of semi structured data, and then carry out the potential knowledge discovery, and the knowledge can provide good data support for decision makers.There are many types of semi-structured data, and the XML document is a typical representative of semi-structured data, so this paper takes the XML document as the research object, and discusses the clustering method based on the XML product design document. XML document is a complex of structure information and content information. Therefore, the structure and content of XML documents should be considered in the clustering process of XML documents. XML document clustering has three processes:representation of document, calculation of similarity and process of clustering. This paper also analyzes the clustering of XML documents from these processes.Firstly, the advantages and disadvantages of tree structure and path set of XML documents are analyzed and summarized. On the basis of analyzing the characteristics of XML document, the method has been improved, the information expression is added to the information expression, the parent node and the level information, and the information is more complete and accurate.Second, in the XML document similarity computation phase, in order to fully consider the information of the XML document tags, add the semantic information of tags, and calculate their semantic similarity based on semantic dictionary. For the product design document based on XML, the general semantic dictionary is lack of relevant domain specialized terms. This paper expands the similarity computation of tag semantic by adding domain specialized terms dictionary. At the same time, the information of the parent node of the same sub node in the two documents may not be consistent, so that the calculation results of the similar it y are affected. According to the analysis of the characteristics of product design document based on XML, the information of parent node can be further abstracted and replaced by more abstract information. Therefore, in order to solve this problem, in order to constru ct the information expression stage, the nodes of the non professional terms are reduced by the addition of a professional term dictionary.Third, the clustering model of product description document based on XML is introduced, and the design and implementation of product description document based on is presented. The results are compared with the results obtained by clustering.
Keywords/Search Tags:semi structured data, XML documents, tags, semantic similarity, clustering
PDF Full Text Request
Related items