Font Size: a A A

Outlier Mining Of Book Selling Information Based On Rough Set

Posted on:2011-07-09Degree:MasterType:Thesis
Country:ChinaCandidate:X J ChenFull Text:PDF
GTID:2178360302988342Subject:Computer system architecture
Abstract/Summary:PDF Full Text Request
In this paper, the currently prevailing attribute reduction algorithm based on the rough set is applied to the detection and analysis of outlier concern selling. Since the outlier mining is a sub-branch of data mining, the former has been applied to a great multitude of fields, where the mined data, instead of being regarded as noisy ones and then discarded, are of certain value and applicability. An algorithm of outlier mining based on dissimilarity is designed, with the basic ideas as follows: in the first place, the positive region reduction algorithm is utilized to extract the relative reduction of the data set concerning books and eliminate redundant attributes; in the second place, the formula of dissimilarity, an accelerating method of detection, is then used to detect the outlier.The main research target of this paper covers the introduction to the prevailing rough set theory and the analysis of three major reduction algorithm based on the rough set: that is, the attribute reduction algorithm based on the discernibility matrix, the attribute reduction algorithm based on information entropy, and the attribute reduction algorithm based on the algebraic form. In this paper, the positive region attribute reduction algorithm is adopted because it is in closer proximity with the essence of rough set reduction and it is algorithmically simple and understandable.The pros and cons of various models for outlier mining are intensively studied, and an algorithm for outlier mining based on dissimilarity is designed, the basic idea of which lies in that the algorithm of positive region attribute reduction of the rough set is used to alter the high-dimensional data set to the low-dimensional one. Meanwhile, the advantage of the data mining algorithm based on the dissimilarity is demonstrated through analyzing the shortcomings reflected in the study proposed by Tu Lihong and Yang Liping concerning isolated vertexes based on dissimilarity.In order to achieve higher flexibility of this system, users can customize the threshold, restrict the range of value in that the smaller the threshold is, the more accurate record of outlier they can obtain, and vice versa. This system exhibits certain flexibility and practicality when applied to the data set concerning book selling.
Keywords/Search Tags:rough set, dissimilarity, outlier mining, attribute reduction, data concerning selling
PDF Full Text Request
Related items