Font Size: a A A

Research Of Data Mining Techniques For XML Documents

Posted on:2006-03-19Degree:MasterType:Thesis
Country:ChinaCandidate:L GuoFull Text:PDF
GTID:2168360152475712Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
Data mining is a non-trivial process of identifying valid, novel, potentially useful, and ultimately understandable patterns in data. It is a efficient method for resolving the problem of "data rich-information poor". For the last decades, data mining is extensively studied in theory and practice, and applied to various fields such as business, industry, and natural sciences.XML(Extensible Markup Language)is a simple, very flexible text format derived from SGML. Originally designed to meet the challenges of large-scale electronic publishing, nowadays, XML is playing an increasingly important role in the exchange and represent of a wide variety of data on the Web and elsewhere due to its expansibility, platform-independence, flexibility, simpleness, standardization and powerful ability for representing data. Hence, there have been increasing demands for efficient methods that extract rules and patterns from XML data, namely XML data mining.However, as a semi-structured data, XML data are a huge amount of complex and heterogeneous data modeled by trees, and cannot be easily mapped into a relational framework. Thus, we cannot directly apply to XML data traditional data mining methods for relational databases, such as Apriori. Hence, it is an important challenge to develop efficient and scalable methods for XML data mining.Against above analyses, this paper first introduce the basic theory of XML, the features of XML documents and traditional data mining technology, then model XML data as labeled trees, propose a new methodology for extracting association rules from XML documents base on concept hierarchy of traditional data mining methods. We present preliminary experiments showing that our method could be capable of extracting association rules from XML documents effectively.
Keywords/Search Tags:Data Mining, XML, Association rules, Concept Hierarchy
PDF Full Text Request
Related items