Font Size: a A A

Research On Data Mining Technology Of Semi-structured Data

Posted on:2019-01-23Degree:MasterType:Thesis
Country:ChinaCandidate:G H LiuFull Text:PDF
GTID:2428330548985709Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
With the popularization of the Internet and the frequent exchange of heterogeneous data,the scale of semi-structured data has been rapidly increasing.How to obtain valuable knowledge and information from these semi-structured data has become a big challenge in the field of data mining.The reason that semi-structured data is difficult to mine is that semi-structured data does not have the strict structure as structured data,making the existing data mining algorithms not suitable for mining.Therefore,this thesis designed and implemented a semi-structured data mining system,which uses data mining technology to mine semi-structured data.In this thesis,the related technologies of semi-structured data were analyzed and a variety of data mining algorithms were compared,XML data and sequential pattern mining were chosen as the research direction on the basis of the analysis of related technologies of semi-structured data and the comparison among various algorithms.The main work of this thesis is as follows:Firstly,the Label Sequence Representation was selected as the representation method of XML data,and on this basis,the Composite Label Sequence Representation for representing multiple XML documents was proposed.Sceondly,the advantages and disadvantages of a sequence pattern mining algorithm called prefixspan algorithm wer analyzed in detail.And the concept of projection coordinates was proposed.Then,the IPBPC algorithm which improved the prefixspan algorithm by projection coordinates was designed.Furthermore,the prefixspan-simp algorithm for mining simple sequence database was given by optimizing the IBPBC algorithm.Besides,experiments were designed to verify the efficiency of the IPBPC algorithm and the prefixspan-simp algorithm.Finally,the semi-structured data mining system was designed and implemented.The system implemented a series of functions from data import,sequence mining to the result display,and demonstrated the feasibility and practicability of the system.The experimental results showed that the semi-structured data mining system designed in this thesis can solve the semi-structured data mining problem to a certain extent.
Keywords/Search Tags:Semi-structured Data, Data Mining, Sequence Pattern Mining, XML, prefixspan algorithms
PDF Full Text Request
Related items