Font Size: a A A

Statistic Modeling And Algorithm Research In Data Mining

Posted on:2006-01-31Degree:MasterType:Thesis
Country:ChinaCandidate:L YangFull Text:PDF
GTID:2178360182477337Subject:Computational Mathematics
Abstract/Summary:PDF Full Text Request
Data mining (DM) nowadays is a hot-issue either in theoretic research or in practice. It is a subject of finding new and useful relationships from mass of data, using the method of mode identify, statistics and mathematics.Influenced by many fields of study, including database technology, statistics, information science, machine-study and visualizing, data mining is a intercross science which is not simply a technology or a software but a comprehensive application of several subject of study. A large part of its study is based on statistic analysis, such as regression and time series method which have a broad application in forecasting model. One of the difficulties is Outlier Mining, which is partly solved by the new-arising subject, say, statistic diagnosis. In this paper, we study the theoretical aspect of data mining, such as outlier mining and influence point mining, from the aspect of statistics.We arrange this paper as follows:In the first chapter, we introduce the development of data mining, statistic diagnosis and our main result in the study.In chapter 2, we present some preliminaries we will use in later chapters, including data mining, linear model, statistic diagnosis.In chapter 3, we study the influence point mining of linear model under elliptic restriction, and show the corresponding statistical function and the outlier mining algorithm.In chapter 4, we study the outlier mining when the parameter estimate of the given linear model is not Least Square Estimate (LSE) but Uniform Biased Estimate (UBE), and present the Cook-distance based on the Uniform Biased Estimate, which we use as an important tool in mining influence point.In the final chapter, we mine stock trading data using time series method, find out the model and outliers in the data and, at last, we show the more exact forecasting model and outlier mining method.At last we complete the paper with some prospects in data mining.
Keywords/Search Tags:data mining, outlier mining, influence detect, time series
PDF Full Text Request
Related items