Font Size: a A A

Tolerance Granular Computing Model And Its Research On Data Mining

Posted on:2013-03-31Degree:DoctorType:Dissertation
Country:ChinaCandidate:J MengFull Text:PDF
GTID:1228330395499289Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
Granular computing is a kind of new idea and new method, which has an outstanding advantage in data mining. It mainly provides a powerful tool for the solution of the massive data mining and complex problems. The classical rough set is limited to equivalence relations of the domain. In practical application, the equivalence relation is too strict to be satisfied, applied and popularized further. If the transitiveness is removed, the equivalence relation degenerates to tolerance relation. The tolerance relation could not get the domain partition but rather covering. Usually, the tolerance rough set theory is based on the incomplete information system. Now the neighborhood is defined to granular and the knowledge is abstracted to derived partition on universe. The derived partition satisfied equivalence relation. They don’t rely on the specific description about the problem. By constructing a tolerance granular computing model, some problems of granular computing are researched, such as granulation, granular computing, classification and clustering. The research works are listed as follows:(1) From the point of view of set theory, the rough approximations based on tolerance relation and neighborhood systems are proposed. Their properties are proved and the accuracy measure is discussed. We find that the one to one corresponding of tolerance knowledge base and tolerance information table is perfect. Pawlak’s complete theorem is extended to general complete theorem. Attribute reduction and rule extraction methods based on neighborhood dependency and center dependency are proposed and their generality is analyzed by the examples.(2) A new symbolic representation method based on granulation is proposed, information granulation is used in time series classification. By segmenting time series and constructing information granules for each segment of time series, compute the similarity of granulation of each segment. Spectral clustering is applied to the formed similarity matrix. Using four time series datasets from UCR Time Series Data Mining Archive, the experimental results show that proposed granulation works successfully for Hidden Markov Model. Comparing with the supervised method and self-training learning method, the semi-supervised method can construct accurate classifiers with very little labeled data available.(3) A method of linear neighborhood propagation based on rough Κ-means clustering is invested. By analyzing the approximate distribution of data, test whether two data lie in the upper approximation or lower approximation; get more information except the distance between data. Using this information to adjust the choice of neighborhoods when the graph is constructed. Experiments with UCI datasets show that comparing with LNP, it is more effectively.
Keywords/Search Tags:Rough Sets, Data Mining, Incomplete Information System, GranularComputing
PDF Full Text Request
Related items