Font Size: a A A

The Research Of An Large-Scale Artificial Neural Network Based And Extensible Text Classification Algorithm

Posted on:2010-03-04Degree:MasterType:Thesis
Country:ChinaCandidate:F B LiFull Text:PDF
GTID:2178360275451204Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
Nowadays, the machine learning based text classification algorithm has a wide range of applications in the field of subject-oriented search engine. However, its"one-time learning"problem has seriously hampered its applications in the field of integrated search engine. The"one-time learning"problem means that the training model should learn the mode at one time. When the trained model learns the new knowledge, it would destroy the existing knowledge. This means that once the category space fixed and the training completed, the model needs to be retrained when some categories needs to be added or removed. The trained model can't inherit the existing knowledge.The size of integrated search engine's category space is usually bigger than subject-oriented search engine. With the explosion of category space and training documents, because of the lack of the ability of inheriting the existing knowledge, the training time will be increasing, the learning efficiency will be decreasing. And sometimes there would be in a dangerous circumstance that the learning process fails. To resolve these problems, the idea of divide and conquer used in large-scale artificial neural network and dynamic multi-tree algorithm are employed, and an extensible text classification algorithm is implemented.1.Group oriented sub-classifierAccording to idea of divide and conquer, the category space will be divided into several groups for the large scale category space and the number of training documents. One training model will be trained for each group. Each sub-classifier has a good effect for categories in the group.2.Efficiently combining the sub-classifierWhen each model is combined by a simple way, the classification effect decreases a lot. To resolve this problem, Dynamic Multi-tree is used to organize the trained model of each group.3.Classes extension(Plasticity)When the category is needed to be extended, we only need to train the model for the added category and add the trained model to the Dynamic Multi-tree without the training of all the categories in the tree. It's the way to inherit the knowledge.There is an experiment in the issue to validate the extensible text classification algorithm and the result shows that the new algorithm has overcame the traditional learning algorithm's"one-time learning"problem, and the classification effect of both algorithm are similar. It also shows that the new algorithm has an ability to train the model parallel, to extend the category space, and to inherit the knowledge. Because of this, the new large-scale artificial neural network based and extensible text classification algorithm fits to apply in integrated search engine area.
Keywords/Search Tags:Search Engine, Text Classification, Pattern Recognition, Large-scale Artificial Neural Network, Dynamic Multi-tree
PDF Full Text Request
Related items