Font Size: a A A

The Research Of Outlier Detection Based On R And Its Application In Fraudulent Trading Detection Of Xidian University

Posted on:2016-09-18Degree:MasterType:Thesis
Country:ChinaCandidate:J F QuFull Text:PDF
GTID:2348330488974139Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
The technology of data storage in Information Age makes it possible to store the massive amounts of data. Data Mining as a rising research field is trying to explore abundant data,complicated data type and modeling technology to solve the crisis of information overload and makes it possible to model on massive amounts of data and extract useful knowledge.So the Data Mining technology including many disciplines as statistics, Artificial Intelligence, machine learning, Pattern Recognition and database is widely used in all areas.Outlier Detection as an important branch is to find the small portion of objects that deviate so much or are very different from the remaining data in a data set and obtain useful knowledge by analysing and studying them. As the viewpoint says, one's noise may also be another's sign, the very few unusual observations may conceal the information that we are interested in or may indicate more tremendous research value. So the research and improvement on outlier detection algorithm is of great theoretical value and practical significance.This dissertation firstly analyses and studies all kinds of outlier detection algorithm on the angel of machine learning, on the base of which doing a research on the case of fraudulent trading detection. The major workload is as follows:(1) Having made a thorough research of outlier detection algorithm in data mining including statistic-based, distance-based, density-based, deviation-based and cluster-based outlier detection algorithm.(2) From the point of machine learning, we introduced the outlier detection model by category of unsupervised, semi-supervised and supervised learning method. Then we present the general model evaluation principle and experimental method. Specifically, the model evaluation indicator of lift chart, PR curve and normalized distance of standard price aiming at the outlier detection model of detecting fraudulent trading is given.(3) The aforementioned three kinds of outlier detection models including LOF, Naive Bayes, Ada Boost.M1, semi-supervised self train model are designed and implemented with R. And these models are applied in the case of detecting fraud trading of one company.According to the Cross-industry Standard Process for Data Mining, the phases of data preprocessing, modeling, model analysis and evaluation are implemented and the final experimental result is given.
Keywords/Search Tags:Data Ming, Outlier Detection, Fraudulent Trading Detection, R, Model Evaluation
PDF Full Text Request
Related items