Research Of Data Mining Using XML Frequently Changing Sections

Posted on:2013-03-25

Degree:Master

Type:Thesis

Country:China

Candidate:J N Zhang

Full Text:PDF

GTID:2248330371482744

Subject:Computer software and theory

Abstract/Summary:

The XML has become the actual standard of data storage and exchange for the goodself-describing, extensibility and excellent cross-platform ability. Data in the form of XML isexpanding. That challenges to the interesting knowledge discovering. In traditional datamining, including classification, cluster and association rules, data is with no forminformation or in the specified form. Nothing was done to the semi-structure data. What ismore, XML changes from time to time. The dynamic of XML is worth the attention.XML data is discussed in this paper. The information and changes of both structure andcontent are considered. Several dynamic metrics are presented in this paper. These metricsintroduce Weight to differentiate the types of change. Frequently Changing Sections (FCS) isdefined based on the metrics. An approach is proposed to mine Frequently Changing Sections.A classifier is built up on the base of FCS.The paper is based on the concept of Frequently Changing Sections. The followingaspect is included:1. In the first two chapters, the background and basic knowledge is presented. At beginthe present research status and the defects of existing work is discussed. Then, thebasic knowledge of XML and a set of technique standard is learned, especially theDOM. At the end, the progress of Data Mining is introduced, several interestingknowledge and the common algorithm to mine them is discussed.2. In the third chapter, Frequently Changing Sections is discussed. At begin, theapproaches to detect the differences between XML documents is present. Thedifference can be represented by a sequence of edit operations. Then, severaldynamic metrics are presented, considering the information of both structure andcontent. These metrics introduce Weight to differentiate the types of change. Anapproach, SC-Mining, is proposed to mine Frequently Changing Sections.HSC-DOM is introduced to reduce the scanning times. Some optimizationtechniques are introduced to make the progress more efficient.3. In the fourth chapter, classification based on the Frequently Changing Sections isdiscussed. At begin, the Vector Space Model with Frequently Changing Sections isdiscussed, then the algorithm to calculate the similarity. Before the classification, cluster is introduced to learn more knowledge about the classification. The classifieris composed of Support Vector Machine and Bayes. The experiments show that thealgorithm is efficient.

Keywords/Search Tags:

XML, Data Mining, Frequent Pattern Mining, Frequently Changing Section, NLP, semantic, classification, support vector machine, Bayes

Related items

1	A Novel Classification Based On Sequential Pattern Mining In Videos
2	Algorithms For Data Stream Mining
3	Research Of Dynamic XML Document Mining Using Frequently Changing Structures
4	Research On Classification Algorithms Of Data Mining Based On Imbalanced Data Sets
5	Discovering Frequently Changing Structures From Historical Versions Of XML Document
6	Study On Frequent Pattern Mining Algorithms And Pruning Strategies
7	The Research And Relization Of Mining Frequent Patterns On Business Data Straems
8	Research On Classification Algorithm Of Data Mining Based On Improved Support Vector Machine
9	Study On Frequent Subtree Mining And Its Application In XML Mining
10	The Research Of Data Mining Based On Support Vector Machine