Font Size: a A A

Research On XML Classification Based On ELM

Posted on:2017-05-05Degree:DoctorType:Dissertation
Country:ChinaCandidate:X BiFull Text:PDF
GTID:1318330542977138Subject:Computer software and theory
Abstract/Summary:PDF Full Text Request
With the development of the Internet technology,XML has become one of the de facto standards of web data storage,representation and exchange.As one of the key tasks of data mining,classification technology provides efficient XML data management to extract valuable information.Therefore,the XML classification problem becomes one of the most important research topics in XML data management.Among the state-of-art classification algorithms,ELM(Extreme Learning Machine)has been widely applied in various application fields due to its extremely fast learning speed,good generalization performance and universal approximation ability,including voice and image recognition,financial market analysis,social data mining,etc.However,there are still a number of problems need to be resolved,including how to improve the representation ability of the XML representation model,how to improve the classification ability of ELM based algorithms.Furthermore,various emerging applications of XML classification bring new challenges:1)in the large-scale XML classification applications,the training samples are stored separately on the distributed file system,which requires distributed converting algorithm of XML data into representation model and distributed training strategies based on ELM;2)in the social stream classification applications,the XML stream classification problem requires the classifier with extremely fast learning and update speed on the premise that the concept drifting problem is handled;3)in the uncertain XML classification applications,the representation,training and predicting of uncertain XML data are the key problems.Therefore,taking different application of XML classification as motivation,this paper focuses on the XML classification based on ELM,studies the classification problem of large-scale XML data,the social media stream,and uncertain XML data.Extensive experiments are also conducted to verify the effectiveness,efficiency and scalability of the proposed algorithms with real-world and synthetic datasets.To summarize,the major contributions of this paper are as follows.(1)To address the problem of XML classification based on ELM,this paper proposes an improved calculation method of feature value and an improved XML representation model DSVM,which increase the representation ability of XML semantic and structural information.An improved classification algorithm based on voting theory is also proposed named v-ELM.Along with its postprocessing methods,v-ELM increases the classification performance.In order to reduce the dependence on postprocessing methods,another improved algorithm PV-ELM is proposed,which applies probabilistic votes.A series of experiments are conducted to verify the classification performance of v-ELM and PV-ELM.(2)As to the large-scale XML data classification in the cloud,this paper first analyzes the challenges of distributed XML representation converting and ELM training algorithms,introduces the related definitions of ELM feature space.Then a distributed implementation of ELM with kernels is proposed named DK-ELM,which includes a distributed RBF kernel matrix calculation method D-RBF,a distributed multiplication method of matrix and vector named DMXV,and a distributed inverse matrix calculation algorithm based on matrix decomposition method.Furthermore,XML learning problems in the ELM feature space is studied.The existing distributed ELM algorithms PELM and POS-ELM are analyzed.Then a distributed clustering algorithm based on k-Means is proposed.Extensive experiments are conducted to verify the classification performance and scalability of the proposed algorithms.(3)As to the social stream classification,this paper analyzes the problem definition of XML data stream classification,and proposes a baseline algorithm named BS-ELM.Based on ensemble strategy,ES-ELM is proposed,which trains several classifiers with update strategy,so that the classifiers with classification performance lower than the threshold will be eliminated and replaced by new classifiers.Based on incremental calculation,OSS-ELM is proposed to utilize all the historical learning experience.OSS-ELM takes advantages of online sequential learning and realizes update operations with little recalculation cost.Experiment results show that the proposed algorithms achieves good classification performance in the social stream classification applications.(4)As to the uncertain XML classification,this paper studies the uncertain XML data model and representation model,gives the formal definition of uncertain XML data classification problems.PU-ELM is proposed to utilize the appearance probability of derived instances as the classification probability.In order to avoid the exponential number of possible worlds instances,UXSampling method is proposed to sample the uncertain XML data,based on which a sampling method based algorithm SU-ELM is proposed.Experiment results show that the proposed algorithms are suitable for uncertain XML data classification applications.
Keywords/Search Tags:XML classification, Extreme Learning Machine, distributed computing, data stream, uncertain data
PDF Full Text Request
Related items