Classification Algorithm Research Of Deep Category In Large-scale Hierarchical Classification

Posted on:2018-11-14

Degree:Master

Type:Thesis

Country:China

Candidate:S C Liu

Full Text:PDF

GTID:2348330518467049

Subject:Computer software and theory

Abstract/Summary:

PDF Full Text Request

With the rapid development of the Internet big data,text data and web data on the network show exponential growth.In order to get potential information from the massive text data rapidly and accurately,users need to understand the theme of the text more accurately and mark the category of the text more deeply.Therefore,text categorization technology has become the important research content.Large-scale multi-level text classification is a research focus in recent years.Since the second half of 2009,the researchers pay great attention to large-scale multi-level research techniques of text categorization,and they often internationally open test and analysis of research results.While,the large-scale Chinese news classification evaluation is held in 2014,it analyses the classification technological achievements of Several competitors.However,the optimal performance of the system is less than 50%,it is difficult to meet the needs of practical application.For large-scale multi-level text classification problem,therefore,the high-precision multi-level automatic text classification technology is very necessary to further researched.Firstly,thesis uses the flattening treatment strategy and the method of change numerous for brief on the basis of learning the processing strategy of large-scale multi-level classification problem.The search,classification of two-stage processing ideas was included.In the search phase,the weighting of the category hierarchy was analyzed and feature was updated dynamically by combining with the structure characteristics of the category hierarchy tree and the related link between categories as well as other implicit domain knowledge.Meanwhile feature set with more classification judgment was built for each node of the category hierarchy tree.In addition,depth first search algorithm was used to reduce the search range and the pruning strategy with setting threshold was applied to search the best candidate category for classified text.Finally,the classic KNN classification algorithm was used to test classification results and do comparative analysis on the basis of the candidate category.Experimental results show that the proposed algorithm improves the average values of F1 in the classification experiments and the classification effect is ideal.Secondly,on the basis of researching related algorithm in this field,multi-stage classification KNN algorithm based on the center vector is proposed,and applied to the classification stage of the large-scale multi-level text classification.The algorithm adjusts training samples based on the idea of density and makes the sample distribution tend to be more ideal uniform state by the sample cutting technology,and calculates all kinds of other class center vectors.Then,guaranteeing the condition of the accuracy of class center vector,the algorithm makes complex calculations of the classification stage in advance to classifier training process.In the last stage,the algorithm uses the appropriate value of m(Primary category number)to treat classification text category according to the nearest neighbor.The experimental results illustrate that the improved algorithm,in the case of no loss of classification accuracy,not only reduces the computational complexity,but also significantly improves the speed of classification.

Keywords/Search Tags:

deep text classification, multi-stage classifier, K-Nearest Neighbor, depth first search

PDF Full Text Request

Related items

1	Research On Nearest Neighbor Search Method Based On Multi-stage Vector Quantization
2	Study On Generalized Nearest Neighbor Pattern Classification
3	Research On Ensemble Classification Based On Nearest Neighbor Multiple Classifier Selection And Application On License Plate Recognition
4	A Study On Chinese Text Categorization
5	Analysis Of Text Information Based On Deep Learning
6	Multiple Hash Tables Indexing And Optimization For Approximate Nearest Neighbor Search
7	Application Of Natural Neighbor In Text Classification
8	Research On Two-stage Hierarchical Text Classification Model Based On Neighbor-assistant Strategy
9	Research On Hashing Accelerated Approximate Nearest-Neighbor Search
10	Deep Compact Coding For Multimedia Nearest Neighbor Search