Font Size: a A A

Research On Text Classification Based On Neural Network And Decision Tree And Its Application

Posted on:2019-04-26Degree:MasterType:Thesis
Country:ChinaCandidate:F LeiFull Text:PDF
GTID:2348330569495781Subject:Engineering
Abstract/Summary:PDF Full Text Request
The world wide web provides a convenient mechanism for document publishing and acquisition,and has now become a gathering place for all kinds of information.In the world wide web,the amount of information is increasing exponentially.How to excavate useful patterns or knowledge from massive text information has become a hot topic of scholars.In data mining,in order to get the content of interest quickly,the text data is automatically classified according to a certain classification model.Text data is characterized by unstructured,subjective and high dimensional features,which makes it hard for text mining algorithms to extract effective and understandable classification rules,and the computation complexity is too high.Therefore,it is very challenging to study the appropriate text feature selection method to reduce the dimension and improve text mining algorithm to obtain the classification rules.In this context,this thesis mainly studies text feature selection method and neural network classification algorithm based on decision tree,and applies this text classification system to the development of Tibet.The main work is as follows:(1)Data preprocessingIn the text data preprocessing part,the main improvements are as follows: adding dynamic stop vocabularies;optimizing TF-IDF method,taking into account synonym and position factors in word frequency calculation;adding document similarity algorithm to document deduplication.(2)Feature selection methodIn this thesis,a new feature selection method is proposed,which takes the sample variance and variance as the evaluation criteria of feature attributes,ranking the importance of feature attributes,and selecting the best subset of feature attributes.The experimental results show that the proposed selection method of sample deviation rate and the variance of the characteristics has higher classification accuracy than the traditional high classification accuracy using word frequency as the feature selection method,which proves that the proposed feature selection method is feasibility and advantages.(3)The classification algorithm of neural network based on decision treeThis paper designs a classification algorithm of neural network based on decision tree,uses the decision tree to optimize the initial weight of the neural network and the structure of the neural network.The algorithm greatly reduces the randomness of the initial value of traditional neural network,improves the rationality of the number of hidden layers,and is more conducive to the generation of optimal neural network models.The experimental results show that the classification algorithm proposed in this paper is improved by 11% and the classification accuracy is improved by 2.5% compared with the traditional neural network.(4)Exhibition of Tibet development results based on neural network classification algorithm based on decision treeUse the text classification model proposed in this paper,classify the text set related to Tibet for political economy,culture and education,and the word cloud visualization technology to show Tibet development in politics and economy,culture,education.Finally,the emotional polarity analysis technology was used to establish the text-related emotion polarity classifier for Tibetan development.
Keywords/Search Tags:text classification, feature selection, neural network, decision tree, visualization
PDF Full Text Request
Related items