Font Size: a A A

Research On Optimization Of Text Classification Based On Improved Rough Set Model

Posted on:2019-02-05Degree:MasterType:Thesis
Country:ChinaCandidate:Y LiuFull Text:PDF
GTID:2428330596465675Subject:Mathematics
Abstract/Summary:PDF Full Text Request
Text classification is to classify unknown texts into one or more categories according to a certain classification system or standard.The current text data gradually presents new features such as mass,diversity,and changeability,this phenomenon introduces a great challenge to the technologies of text classification.Rough set has natural advantages in dealing with the uncertainty and ambiguity of the data.The main idea of rough set is to obtain basic decision and classification rules through attribute reduction without reducing the classification ability.Combined with the existing technology of text classification,this paper proposes a new text classification method under the research of rough set.The specific research works are the follow.Firstly,this paper proposes the improved differential relation and the restriction differential relation under the study of the differential relation.In the process of constructing the relation,the degree of difference between attributes is redefined by combining the study of differential relation and the boundary control of tolerance rough sets.Based on the improved differential relation,the proposed method built the extended rough set model in incomplete information system,which solves the problem that the coarse granularity of rough set knowledge is too thick and the limitation of complex data processing in classical rough set.Secondly,the variable precision rough set based on the misclassification rate was introduced to improved the differential relation,and an improved variable precision rough set model is constructed.The noise problem in the data is well solved.Compared with the traditional variable precision rough set,the classification accuracy of our rough sets is improved,and the classification results are more reasonable and accurate.Based on the improvement of rough sets,a new attribute reduction algorithm is proposed,which resolves the highdimensional crisis of data and improves the ability of classification decision making.Numerical experiments on datasets of UCI verify the reduction ability of the new algorithm well.Finally,aiming at the interoperability between attribute reduction based on rough set and feature selection in text classification,this paper introduce attribute reduction in improving CHI feature selection,and a new classification rule extraction algorithm is designed.In this algorithm,the negative contribution feature in the improved CHI feature selection is incorporated into the rule extraction process,which generates the negative decision rules.Such an approach can ensure that the proposed algorithm make classification from negative.Based on this,a new text classification method is constructed,and the comparative analysis of algorithm on the numerical experiment verify the feasibility of the algorithm.Compared with the classification methods involved in this paper,the new improved method increases 12.86% in the number of applicable texts,and the improvement effect is well.
Keywords/Search Tags:Rough set, text classification, attribute reduction, differential relation, feature selection
PDF Full Text Request
Related items