Font Size: a A A

Research On Outlier Mining And Intensional Knowledge Discovery

Posted on:2009-09-05Degree:MasterType:Thesis
Country:ChinaCandidate:F N LianFull Text:PDF
GTID:2178360272490105Subject:Computer software and theory
Abstract/Summary:PDF Full Text Request
Data are considered as a kind of most valuable resource in information society today. Lots of useful knowledge is hidden in complex datasets, discovering and using such knowledge have become the preconditions of scientific decision. Data mining extract the potential useful information and knowledge which is hidden and prior ignorant from large, uncompleted and noisy datasets by means of association rules mining, clustering and classifying.Outlier mining is one of important technology in data mining. Outliers are observations that lie an abnormal distance from the others and do not satisfy the common patterns or actions. They are always doubted generated by another way. Outliers are not considered as wrong data, some outliers maybe contain important information, such as fraudulent behavior, intrusion activity, unusual consuming behavior and so on. So, it is very significant to research outliers.Outlier mining can be broken up into 3 parts:①What kind of observation is considered as an outlier?②How to find out outliers effectively?③Why the outliers are exceptional, which we call intensional knowledge. At present, most of outlier mining algorithms just focus on the identification of outliers. They all fail to provide the reasons for why an outlier is considered exceptional, which is also important to the users and the purpose of outlier mining.An association space-based outlier mining algorithm is proposed in this paper. It finds out the smallest attribute set which causes an observation to be exceptional, and provide its intensional knowledge—it is these attributes that cause the observation outlier. Specifically speaking, the research here mainly includes following aspects:①Several key notions and technologies of data mining are researched, including the application and classify of data mining, data pretreatment, clustering, and association rules.②Good points and bad points of k-means algorithm are discussed, and several initialization methods are studied. Finally, a novel initialization method is proposed.③The theories and methods of distance-based outlier mining are analyzed and summarized roundly. A sum-of-k-nearest neighbor-based outlier mining algorithm is designed, and a partition-based algorithm is introduced.④The FindNonTrivialOuts algorithm is investigated, and an association space-based outlier mining algorithm is proposed, which is verified through experiment study.
Keywords/Search Tags:Outlier Mining, Intensional Knowledge, Association Space
PDF Full Text Request
Related items