Font Size: a A A

Based On Information Entropy And The Subspace Outlier Mining Algorithm

Posted on:2010-06-08Degree:MasterType:Thesis
Country:ChinaCandidate:H ZhangFull Text:PDF
GTID:2208360278976172Subject:Computer software and theory
Abstract/Summary:PDF Full Text Request
The task of Outlier Mining is to discover exceptional, interesting, sparse and isolated patterns concealed in massive data set. It can find some real, but unexpected knowledge. Therefore, it is of significance to mine abnormal behaviors and patterns by studying Outlier Mining methods. The traditional outlier mining methods are subject to man-made factors; in addition, mined outliers can not be analyzed further. We have adopted the information entropy as a means of measuring outlier data, and studied Outlier Mining methods have been studied. Main researches are as follows:1) A new data mining algorithm—Outlier Mining algorithm based on Information Entropy is presented by using outlier measure factor based on information entropy. In the algorithm, outlier measure factor of each record is calculated by using information entropy, and then outliers are detected by the values of outlier measure factor, so that impact by man-made factors is eliminated in outlier mining. The definition of outlier based on outlier measure factor could explain the meaning of the outliers. In the end, experimental results show the feasibility and effectiveness of the algorithm by utilizing UCI and high-dimensional star spectrum data.2) An outlier mining algorithm based on characteristic attribute subspace is proposed. Firstly, the definitions of attribute entropy and characteristic attribute are introduced to make corresponding characteristic attribute subspace and attribute weight. Secondly, subspace outlier influence factor is computed by abnormality degree, and then outliers are found. Finally, experiment results show that the algorithm is feasible and effective, because it is not dependent on parameters which user input and has strong flexibility.
Keywords/Search Tags:Outlier, Information entropy, Outlier measure factor, Characteristic attribute, Sub-space
PDF Full Text Request
Related items