Font Size: a A A

The Approving Study Of Sampling Technology Used In Data Minng Area

Posted on:2011-08-27Degree:MasterType:Thesis
Country:ChinaCandidate:X Y XieFull Text:PDF
GTID:2178360305968829Subject:Statistics
Abstract/Summary:PDF Full Text Request
Sampling techniques have not pursued their essential positions in data mining realm as well as they've got in statistics in which as one of the most important analysis methods. Although some scholars put forward that parallelization and chunk algorithm would be even better than sampling in super data sets. But during the practice process we found that sampling techniques have the advantages that other methods couldn't compare when the size of the data sets is from ten thousand to hundred thousand——sooner speed higher accuracy and easier realization. Furthermore, application in real world is different to scientific study. How to get the mining results as an assistant to concern decision and policy quickly accurately is the final purpose. Rebuilding and restructuring complicate analysis environment is losing more than gain outside of the mining laboratory.For further study to the possibility of the application in the data mining, this paper launch from theory certification and example research. In the first aspect:Defining the overall process of KDD scientifically; Summarizing the extension of the sampling technology in the current mining realm; Inducing three most important mining method popularly and making an elaboration about the combination of the sampling and data mining. Citing two new methods about sampling used in mining which was named as improving static sampling and improving progressive sampling algorithm. In the second aspect, discussing the current extension of the combination of the sampling and mining in the area of the finance, insurance, retail, manufacturing etc. Meanwhile an essential testing for new method has been made to compare the betterment of the function after improving.The main reforming work has been done in this paper as following:(1) Put forward an improvement static sampling algorithm and an improvement progressive sampling algorithm.(2) Three data mining methods such as classification association and clustering based on the improvement sampling are designed and realized in the WEKA as well as got the merits of these algorithms.(3) Put forward a new valuation method about association algorithm which has been realized in this paper.
Keywords/Search Tags:Static sampling, progressive sampling, classification, association, clustering, ability evaluation
PDF Full Text Request
Related items