Font Size: a A A

Research And Implementation Of XML Document Classification Based On Extreme Learning Machine

Posted on:2012-03-03Degree:MasterType:Thesis
Country:ChinaCandidate:X BiFull Text:PDF
GTID:2348330482456994Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
XML document, which naturally supports user-defined custom structural information, has been widely applicated in viarious domains including finance, bioinformatics, etc. How to manage XML data efficiently has now become one of the hottest subjects of academic study. This thesis focuses on the research on XML document classification and proposes a solution of XML document classification with high performance.To represent XML document more efficiently, an optimized model named DSVM (Distributed Sturctured Vector Model) is proposed. DSVM improves the calculation defects of TFIDF in the traditional VSM (Vector Space Model) by the means of taking fully the informtion of category distribution into account. DSVM also contains structural information by a great degree so that DSVM has a much better representation ability of both semantic and structural information of XML document. This thesis also proposes an optimized v-ELM (voting-ELM) algorithm based on voting theory and ELM (Extreme Learning Machine) to improve the performance of the classifier. In v-ELM, OAO (One-against-one) method is used to divide the multi-class classification problems into binary classification problems, training binary classifiers between each two of all the classes. In order to solve the consequently voting result problems generated by voting theory, three postprocessing methods are conducted in this thesis. REV (Revoting of Equal Votes) method and p-REV (Probability Based Revoting of Equal Votes) is to revote under the situation of equal max votes. RCC (Revoting of Confusing Classes) method focuses on the confusing classes discovered during the training phase.A series of experiments are conducted in this thesis. The evaluation results show that DSVM has an improved represention abiblity. The v-ELM algorithm with p-REV and RCC postprocessing method sacrifies a small amount of training time and testing time to gain a much better accuracy than traditional classification algorithms. The integrated framewok of XML document classification conducted in this thesis has achieved a satisfactory performance.
Keywords/Search Tags:xml, classification, extreme learning machine, voting theory
PDF Full Text Request
Related items