Font Size: a A A

Research On The Text Hierarchy Construction And Classification Method Based On The Relax Strategy

Posted on:2017-09-26Degree:MasterType:Thesis
Country:ChinaCandidate:X Y DuFull Text:PDF
GTID:2348330503992874Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
With the growing of digital information resources, there is a great amount of text data. In order to manage and utilize these text data effectively, people come up with the text automatic classification technology. It can help deal and organize the huge text data without structure so that improve the retrieval effectiveness of text.Most text classification usually adopt flat classification method. When the number of text and class are large, the performance of flat classification method declines quickly, especially on the aspect of time. Hierarchy classification is always used in text multiclass classification tasks. On the premise of no accuracy loss, text hierarchy classification can be faster than flat methods. In order to improve the accuracy of text hierarchy classification further, for the text hierarchy construction, this paper proposes a hierarchy construction method based on relax strategy. Then we propose a soft decision classification algorithm based on the relax strategy hierarchy according to the characters within the hierarchy. Moreover, this paper also trys to apply different feature extraction methods to the hierarchy classification. The main works of this paper are shown as follows:(1) Solution to the “block” problem in the hierarchy classification- Relax StrategyThere is always a “block” problem in the hierarchy classification. In order to improve the accuracy of text hierarchy classification further and relieve the influence of low accuracy caused by the “block” problem, this paper studies and improves the text hierarchy construction method, then proposes a hierarchy construction method based on relax strategy and uses it to conduct text hierarchy classification.(2) Performance improvement of the hierarchy classification- Soft Decision methodThe classification path in the hierarchy may be not global optimal. Together with the “block” problem, text hierarchy classification methods often obtain low accuracy. Therefore, this paper utilizes the soft decision method to conduct classification in the constructed relax strategy hierarchy. Then we study the characters within the relax strategy hierarchy and adjust the soft decision hierarchy classification algorithm to fit with that kind of hierarchy so that we can improve the accuracy further.(3) Successful application of feature selection and feature computation methods-Least Information Theory(LIT)This paper introduces LIT and trys several different feature extraction methods to observe their fitness in text hierarchy classification. For feature selection methods, this paper introduces Least Information Gain(LIG) to the experiements. As for the feature computation methods, this paper selects Least Information Binary(LIB), Least Information Frequency(LIF) and LIB*LIF which come from LIT.The experiments show that the text hierarchy constructed by proposed method is more reasonable. When we conduct the improved soft decision classification algorithm based on the relax strategy hierarchy, the classification accuracy increases further and outperforms traditional algorithms. Meanwhile, the introduction of feature selection and feature computation methods based on LIT improves the classification performance significantly.
Keywords/Search Tags:Relax strategy, Soft decision, Hierarchy construction, Hierarchy classification, Feature extraction
PDF Full Text Request
Related items