Font Size: a A A

Research And Implementation Of Uncertain XML Documents Classification Based On Extreme Learning Machine

Posted on:2015-06-21Degree:MasterType:Thesis
Country:ChinaCandidate:C WangFull Text:PDF
GTID:2308330482457031Subject:Computer technology
Abstract/Summary:PDF Full Text Request
With the great development of the Internet, as the representative format of documents in IT industry, XML has been widely utilized in various fields, such as finance, e-commerce, web service, data exchange, due to its good cross-platform compatibility. XML plays an important role in the information age. At the same time, as XML document classification problem is an important part of XML data management and mining, it has been given serious attention. Currently, most existing XML document representation models and XML classification methods are proposed over deterministic XML documents. However, due to the network failure, information updates delay, incompleteness of information extraction and other factors, the XML data contains many inherent uncertainties, which brings tremendous challenges to the XML document classification. In this thesis, aiming at the problem of uncertain XML documents classification, we present in-depth research and design a novel solution to the XML document classification.Based on the characteristics of the uncertain XML document data models that analyzed in this thesis, a set of uncertain XML document instances is generated to represent an uncertain XML document. And according to the appearance probability of each instance related to the uncertain XML document, the IU-ELM (Instance based Uncertain ELM) is proposed in this thesis. For the binary classification of uncertain XML documents, the optimized IBU-ELM (Instance based Binary Uncertain ELM) is proposed in this thesis. Secondly, based on the analysis of the problem that too many instances are generated by the extreme learning machine based on the appearance probability, the Monte Carlo sampling algorithm is proposed to reduce the amount of uncertain XML document instances. Through the sampling method a set of uncertain XML document sample instances are generated to indicate an uncertain XML document. On this basis, the MCU-ELM (Monte Carlo based Uncertain ELM) is proposed.As can be seen from the experimental results, both the Instance based Uncertain ELM and the Monte Carlo based Uncertain ELM that proposed in this thesis have good classification performance over uncertain XML document, and are superior to the support vector and uncertainty learning machine. The Instance based Uncertain ELM is more suitable for the situation in which the probability distribution nodes are less in the uncertain XML document. The Monte Carlo based uncertain ELM effectively reduces the running time of uncertain XML document classification at the expense of classification accuracy to some degree.
Keywords/Search Tags:XML, uncertain, classification, Extreme Learning Machine
PDF Full Text Request
Related items