Font Size: a A A

Research Of Data Mining Using XML Frequently Changing Sections

Posted on:2013-03-25Degree:MasterType:Thesis
Country:ChinaCandidate:J N ZhangFull Text:PDF
GTID:2248330371482744Subject:Computer software and theory
Abstract/Summary:PDF Full Text Request
The XML has become the actual standard of data storage and exchange for the goodself-describing, extensibility and excellent cross-platform ability. Data in the form of XML isexpanding. That challenges to the interesting knowledge discovering. In traditional datamining, including classification, cluster and association rules, data is with no forminformation or in the specified form. Nothing was done to the semi-structure data. What ismore, XML changes from time to time. The dynamic of XML is worth the attention.XML data is discussed in this paper. The information and changes of both structure andcontent are considered. Several dynamic metrics are presented in this paper. These metricsintroduce Weight to differentiate the types of change. Frequently Changing Sections (FCS) isdefined based on the metrics. An approach is proposed to mine Frequently Changing Sections.A classifier is built up on the base of FCS.The paper is based on the concept of Frequently Changing Sections. The followingaspect is included:1. In the first two chapters, the background and basic knowledge is presented. At beginthe present research status and the defects of existing work is discussed. Then, thebasic knowledge of XML and a set of technique standard is learned, especially theDOM. At the end, the progress of Data Mining is introduced, several interestingknowledge and the common algorithm to mine them is discussed.2. In the third chapter, Frequently Changing Sections is discussed. At begin, theapproaches to detect the differences between XML documents is present. Thedifference can be represented by a sequence of edit operations. Then, severaldynamic metrics are presented, considering the information of both structure andcontent. These metrics introduce Weight to differentiate the types of change. Anapproach, SC-Mining, is proposed to mine Frequently Changing Sections.HSC-DOM is introduced to reduce the scanning times. Some optimizationtechniques are introduced to make the progress more efficient.3. In the fourth chapter, classification based on the Frequently Changing Sections isdiscussed. At begin, the Vector Space Model with Frequently Changing Sections isdiscussed, then the algorithm to calculate the similarity. Before the classification, cluster is introduced to learn more knowledge about the classification. The classifieris composed of Support Vector Machine and Bayes. The experiments show that thealgorithm is efficient.
Keywords/Search Tags:XML, Data Mining, Frequent Pattern Mining, Frequently Changing Section, NLP, semantic, classification, support vector machine, Bayes
PDF Full Text Request
Related items