Font Size: a A A

Hierarchical Multi-Label Classification Based On Auto Encoder

Posted on:2018-11-05Degree:MasterType:Thesis
Country:ChinaCandidate:Y YangFull Text:PDF
GTID:2428330512497260Subject:Computer technology
Abstract/Summary:PDF Full Text Request
Classification is a classic problem in data mining.Hierarchical multi-label classification,each instance can be classified into two or more classes which conform hierarchical relation simultaneously,differently from conventional classification.It requires us to use different classification methods,or convert these multi-label classification problems into several relatively simple single classification problem.It is also possible to extend the common single-classification algorithm to solve multi-label classification problems,and to satisfy the hierarchical constraint relationship.With the advance of judicial reform which aims to promote judicial openness,massive documents of judgment containing the facts and applicable laws of legal cases are public on the Internet gradually,which makes it possible to recognize the applicable laws of legal cases automatically by data mining.However,it is quite complicated to accomplish the goal due to two reasons.The first reason is that documents of judgement,which are the data to be dealt with,are in text form,while normal classification methods are usually designed for structured data.The second reason is that the articles of law are structured in a hierarchy and multiple articles of law with different levels can be applicable to a single legal case,making automatic recognition of applicable laws of legal cases a hierarchical multi-label classification problem.Hence,thorough analysis of the characteristic of the problem is needed to find out the solutions for practical and effective automatic recognition of the applicable laws of legal cases.In this thesis,a system that can automatically prediction of the applicable laws of legal cases was built by data mining on documents of judgement.The most important contributions of this work are the following:1.Proposed a hierarchical multi-label classification algorithm named dAE-HMC based denoising autoencoder.As a kind of local hierarchical multi-label classification learning algorithm,dAE-HMC will inspect the label space hierarchy in training stage,and expand labels of multi-label training dataset,so that the result of algorithm can satisfy the hierarchical limit.In the prediction phase,the test sample is encoded in each layer of the label hierarchy by the denoising autoencoder,and then the label at this level is predicted by using the softmax classifier.The prediction result at the upper level will be the input for the next layer of denoising autoencoder.The predictive results of the dAE-HMC algorithm can satisfy the hierarchical limits directly without additional modification.2.Many documents of judgement are collected by web crawling,from which the facts and the applicable laws of legal cases are extracted.Using text mining technology,the facts of legal cases can be converted to structured text feature vectors.Consequently,the structured sample data set containing the facts and the applicable laws of legal cases can be constructed.The prediction model of the applicable laws of legal cases was built by performing dAE-HMC on the constructed structured sample data set.The experiment compared the performance of dAE-HMC with different value of parameters,proved that dAE-HMC is effective in automatic recognition of the applicable laws of legal cases by comparing with two other common hierarchical multi-label classification algorithms.
Keywords/Search Tags:Text mining, AutoEncoder, Hierarchical multi-label classification
PDF Full Text Request
Related items