Font Size: a A A

Study On The Method Of Outlier Detection And Its Application In Machine Learning

Posted on:2019-05-31Degree:MasterType:Thesis
Country:ChinaCandidate:Z Z WangFull Text:PDF
GTID:2348330542458790Subject:Mathematics
Abstract/Summary:PDF Full Text Request
In the process of data mining,the feature analysis of data space structure is very important.Outlier detection is the key to the analysis of data space feature distribution.The main methods include a statistically based method of setting statistics and test levels,a distance metric for a given feature space and a distance-based detection method for adjacent domains,and a density-based method for determining local reachable and corresponding density parameters.And an outlier detection method based on unsupervised clustering and then based on the center of the class.However,in the highdimensional data,the object-based and data space structure of the outlier detection method in the data mining process is weak link.In this paper,the lithology attribute analysis and frequent pattern mining are carried out from the logging data,and the data and classification are updated by the outlier detection.Finally,the Machine Learning classification is used to predict the lithology.Firstly,the physical properties of logging data are analyzed,and the logging parameters are selected to provide the data base for sensitivity analysis and correlation analysis.Then,the sensitivity analysis of the multidimensional attribute space of logging data is made from four aspects: single variable,data subspace,classification clustering cross validation and information entropy analysis.Based on the correlation analysis of each variable and subspace,Karhunen-Loève orthogonal transformation is carried out to find the orthogonal vector group which is the largest difference between the classification space.Secondly,the filtered logging data are coded and discretized,and the minimum support threshold minsup is set.The frequent pattern mining is carried out to find the corresponding frequent pattern.Then,according to the frequent pattern,each sample of the original data set is voted,the corresponding frequent pattern outliers are defined and sorted,and the outlier is detected according to the minimum support threshold.The points below the minimum support threshold are removed and the corresponding data sets and their classifications are updated according to the detection of the previous collective outliers and context outliers.The outliers of frequent pattern mining are embedded in Machine learning to constrain the influence of outliers on semisupervisory calculations.Finally,the prediction results are obtained and the conclusions are evaluated.Finally,the improvement of the Machine learning model is made by outlier mining based on frequent patterns.The lithologic identification of the 41-33 block in the Sulige gas field shows that the analysis of the spatial structure of the data and the excavation and outlier constraints of the corresponding model contribute to the optimization of the lithologic identification model of the logging parameters and further improve the prediction accuracy.
Keywords/Search Tags:outlier detection, frequent pattern mining, machine learning, lithology identification
PDF Full Text Request
Related items