Font Size: a A A

Classification Algorithm Research Of Deep Category In Large-scale Hierarchical Classification

Posted on:2018-11-14Degree:MasterType:Thesis
Country:ChinaCandidate:S C LiuFull Text:PDF
GTID:2348330518467049Subject:Computer software and theory
Abstract/Summary:PDF Full Text Request
With the rapid development of the Internet big data,text data and web data on the network show exponential growth.In order to get potential information from the massive text data rapidly and accurately,users need to understand the theme of the text more accurately and mark the category of the text more deeply.Therefore,text categorization technology has become the important research content.Large-scale multi-level text classification is a research focus in recent years.Since the second half of 2009,the researchers pay great attention to large-scale multi-level research techniques of text categorization,and they often internationally open test and analysis of research results.While,the large-scale Chinese news classification evaluation is held in 2014,it analyses the classification technological achievements of Several competitors.However,the optimal performance of the system is less than 50%,it is difficult to meet the needs of practical application.For large-scale multi-level text classification problem,therefore,the high-precision multi-level automatic text classification technology is very necessary to further researched.Firstly,thesis uses the flattening treatment strategy and the method of change numerous for brief on the basis of learning the processing strategy of large-scale multi-level classification problem.The search,classification of two-stage processing ideas was included.In the search phase,the weighting of the category hierarchy was analyzed and feature was updated dynamically by combining with the structure characteristics of the category hierarchy tree and the related link between categories as well as other implicit domain knowledge.Meanwhile feature set with more classification judgment was built for each node of the category hierarchy tree.In addition,depth first search algorithm was used to reduce the search range and the pruning strategy with setting threshold was applied to search the best candidate category for classified text.Finally,the classic KNN classification algorithm was used to test classification results and do comparative analysis on the basis of the candidate category.Experimental results show that the proposed algorithm improves the average values of F1 in the classification experiments and the classification effect is ideal.Secondly,on the basis of researching related algorithm in this field,multi-stage classification KNN algorithm based on the center vector is proposed,and applied to the classification stage of the large-scale multi-level text classification.The algorithm adjusts training samples based on the idea of density and makes the sample distribution tend to be more ideal uniform state by the sample cutting technology,and calculates all kinds of other class center vectors.Then,guaranteeing the condition of the accuracy of class center vector,the algorithm makes complex calculations of the classification stage in advance to classifier training process.In the last stage,the algorithm uses the appropriate value of m(Primary category number)to treat classification text category according to the nearest neighbor.The experimental results illustrate that the improved algorithm,in the case of no loss of classification accuracy,not only reduces the computational complexity,but also significantly improves the speed of classification.
Keywords/Search Tags:deep text classification, multi-stage classifier, K-Nearest Neighbor, depth first search
PDF Full Text Request
Related items