Font Size: a A A

Research And Application Of Outlier Mining And Finding Intentional Knowledge

Posted on:2006-06-29Degree:MasterType:Thesis
Country:ChinaCandidate:S L LuFull Text:PDF
GTID:2168360155971499Subject:Computer software and theory
Abstract/Summary:PDF Full Text Request
The problem of outlier mining has been variously called outlier analysis, anomaly detection, exception mining, detecting rare events, mining rare classes, deviation detection, etc. Outlier may be "dirty data", but it also can means meaningful event corresponding to the reality. From the point of knowledge discovery, rare events are often more interesting and valuable than others in many domains, where the rare events'importance is quite high compared to other events, making their detection and analysis extremely important. The main work in the thesis are listed as follows: (1)Summarizing the problem of outlier mining from the realistic meaning, algorithms, application ranges, detection tools, algorithm's evaluation, etc. (2)To overcome the limitation of requiring threshold in existing distance-based algorithms, this paper proposes a new definition of outlier. This definition use one object's distance with all others objects in the dataset to judge the object if an outlier or not, so the problem that the algorithm needs setting the near neighbour parameter p or k is solved effectively. To improve the efficiency, a sampling-based approximate detection algorithm has been developed. Experiments have been carried out with real data, the results indicates that not only the newly definition get the same results as DB(p,d)'s, but also points out the outlier's outlying degree in the dataset, and simplifies the requirement for detecting outliers. (3)This paper researches the problem of local outlier in mutil-dimensional and categorical attribute datasets, and give a new definition of outlier base on the theory of difference in frequently of the attribute value, and propose the anomaly existing criterion for estimating the significance of the detected outlier. The experimental result indicates that the criterion can get rid of a large amount of objects which degree of deviation is not remarkable really. We eveluate the validity of our algorithm from four aspects: interesting of the result, comparison with similar algorithms, contributions to the improving of the accuracy rate of classification and ability of detection rare class. (4)An experimental platform named SOD(Smart Outlier Detection) has been constructed, which integrates the algorithms proposed or improved in this paper, and provides a tool for the analysis of outlier detection. It can obtain data from several outside data sources by the data access interface, this strengthens its practicability. SOD has been integrated with a teaching management system. (5)Combining the characteristic of the teaching management system, this page discusses the necessity for using outlier detection in the administrative system, and provide some instances basing on the actual demand. Our purpose is to construct an experiment platform for Mining Outlier and Finding Intentional Knowledge from real data. Five organized aspects are included in this page: distance-based approximate algorithm which simplify the requirement for setting the threshold, algorithm deal with categorical and high-dimentinal data, effective algorithm for mining exceptional rule and the intensional knowledge, algorithm for mining anomalous patters in static time-series data in which the subsequences'length are equal, and the software platform integrated the above four algorithms. The final purpose of outlier detection lies in its application. this theise make a beneficial discussion and attempt to the application of outlier mining in the teaching management system.
Keywords/Search Tags:Outlier, Anomaly Detection, Frequent Itemset, Exceptional Rule, Teaching Management System
PDF Full Text Request
Related items