Font Size: a A A

Research Of Attributes Reduction And Samples Reducding Algorithm Based On Neighborhood Rough Sets And Application In Text Categorization

Posted on:2016-02-14Degree:MasterType:Thesis
Country:ChinaCandidate:H L LiangFull Text:PDF
GTID:2298330470951648Subject:Electronics and Communications Engineering
Abstract/Summary:PDF Full Text Request
With the rapidly development of science and technology,the data of every field hasincreased dramatically, in order to extract useful information to people thought the vastamount of data, its need to deal with the data, thus, achieving the effective usefulinformation. In the learning process of learning method, the data often have a lot of samples,but, not all of the samples or their atrributes are contribute to the classifier, some of themmay decrease the classification precision. Thoughe reducing the samples or attribute can notonly decrease the time and improve the efficiency of classification, but also save the storagespace. Therefore, in the research of attribute reduction and sample reducing is the importantcontent before classification, its has important theoretical significance.The proposed neighborhood rough sets model(NRS) extends the classical rough settheory, its overcomes the classical rough sets’ fault that can only handle symbol typenumerical attributes. NRS can not only handle symbolic type numerical attributes, also candeal with continuous numerical attributes, therefore has important practical significance.Based on the basic idea of NRS, this paper improved the algorithm of attribute reductionbased on positive region only consider its correctly distinguish sample, and analyzed thesample about reducing uneven distribution of samples in the sample space in the process ofthe impact on the classification precision. This article’s main innovation points are,(1) Define the distinguished object sets, and discusse their basic properties, newattribute importance measurment is presented based on the distinguish object sets.Since the algorithm of attribute reduction based on positive region just consindered the rightdistinguish samples, this paper use the high approximation and the concept of neighborhoodinformation granular, put forward the concept of distinguished object sets.(2) The improved attribute reduction algorithm is designed. This algorithm consideredboth the relative positive region of information decision table and the influence of boundarysamples in growing the condition attribute.When add attribute to the information decisiontable cause edge boundaries when change of sample is biggest, the most affect the attributesin decision table, which can implement attribute reduction of decision table.(3) Design a samples reducing algorithm based on attribute. In the process of samplesreducing, a samples reducing applying NRS Based on Density is proposed to reduce thetraining samples. The category was selected whose category density is biggest differentfrom the average density of categories. In the selected category, the sample was chosenwhich had the most neighborhood samples by the neighborhood model. And then theneighborhood samples of the chosen sample were removed in the category, so as to removethe original sample set maximum density difference category redundant samples, make thecategory sample distribution more uniform.(4) Using the attribute reduction and samples reducing algorithm to text categorization.Due to Chinese text cannot be directly processed, its need to establish a corresponding NRStext classification pretreatment process, in order to use NRS deal with the chinese text, inturn, can help users promptly and accurately obtain the required information.Theexperimental results showed that the improved heuristic attribute reduction algorithm basedon NRS could reduce the text key words, the algorithm of samples reducing applying NRSBased on Density can reduce the number of chinese text, through the two algorithmproposed in this paper, could reduce the dimension of text sets, and its have certain practicalsignificance and application.
Keywords/Search Tags:rough sets, neighborhood model, attribute reduction, distribution density, sample reducing, text categorization
PDF Full Text Request
Related items