Research And Application Of Outlier Data Mining Algorithm Based On Deep Forest

Posted on:2022-02-16

Degree:Master

Type:Thesis

Country:China

Candidate:R F Li

Full Text:PDF

GTID:2518306521994969

Subject:Computer technology

Abstract/Summary:

With the rapid development of big data,outlier mining,as an important research branch in data mining,plays a vital role in many scenarios and can help us discover many valuable knowledge and abnormal patterns.The deep forest algorithm can effectively mine different types of data in the data set,but there are subtree features in the algorithm that are randomly selected,and the irrelevant attributes in the data features may affect the performance of the algorithm,and the time complexity of the algorithm is relatively high problem.Therefore,in view of the problems of the algorithm,this topic uses weight factors,simple random sampling and other ideas to conduct research on outlier mining.The main results are as follows:(1)Outlier data mining algorithm based on weighted deep forest(WDF).Aiming at the randomness problem of subtree feature selection in deep forest,a construction method of weighted deep forest is given and applied to outlier detection.Firstly,the weight factor μ is defined by the predicted probability of the forest,which can describe the accuracy of the current layer in the forest.Secondly,the weight factor μ is used as the weight of each forest in the cascade layer,in order to reduce the influence of the random selection of the root node characteristics in the forest on the performance of the algorithm;according to the different distribution of data samples,the local isolation factor α is redefined,which can describe the degree of data outlier.On this basis,an outlier data mining algorithm based on weighted deep forest is given.Finally,the experimental verification results show that the algorithm has higher mining quality in outlier mining compared with similar algorithms.(2)Weighted deep forest fast outlier data mining algorithm based on multiple sampling(FWDF).Aiming at the problem of large feature repeatability in deep forest algorithm using sliding window for data conversion,a fast construction method of weighted deep forest based on multiple sampling is presented and it is applied to outlier detection.First,in multi-granularity scanning,the input features are randomly selected according to the window size,and then sub-sampling is used to extract data instances,which gives a deep forest fast construction method(FDF).Second,integrate the FDF algorithm into the WDF algorithm to detect outliers.On this basis,a weighted deep forest fast outlier data mining algorithm based on multiple sampling is given.Experiments show that this algorithm has higher mining efficiency compared with WDF algorithm.(3)A prototype system for outlier data mining for stellar spectroscopy is carried out.Under the windows 7 development environment,the system uses python as the development language to realize preprocessing functions such as feature selection,missing value processing,feature weighting,as well as outlier detection of FWDF algorithm,result visualization and other functions.Finally,the use case test results show that the system is feasible,which provides an effective way to explore unknown and rare targets in a specific context.

Keywords/Search Tags:

Outlier detection, Deep forest, Outlier factor, Stellar spectrum, Multiple sampling

Related items

1	Research And Application Outlier Detection Method Based On Density&Distance
2	Research And Application Of Outlier Detection Algorithm
3	Outlier Mining Method Based On Gini Indexes And Sub-space Research
4	Research On Anomaly Detection Model Based On Deep Variational Learning
5	Improvement Of Density-Based Local Outlier Detection Algorithm
6	Research On Local Outlier Detection Algorithm
7	An Improvement To The Angle-based Outlier Detection Algorithm
8	Outlier Mining Algorithm Research And Application
9	Research On Algorithms For Outlier Detection
10	The Outlier Detection Algorithm Based On Decision Outlier Factor And Markov Model