Font Size: a A A

Research And Application Of Outlier Data Mining Algorithm Based On Deep Forest

Posted on:2022-02-16Degree:MasterType:Thesis
Country:ChinaCandidate:R F LiFull Text:PDF
GTID:2518306521994969Subject:Computer technology
Abstract/Summary:PDF Full Text Request
With the rapid development of big data,outlier mining,as an important research branch in data mining,plays a vital role in many scenarios and can help us discover many valuable knowledge and abnormal patterns.The deep forest algorithm can effectively mine different types of data in the data set,but there are subtree features in the algorithm that are randomly selected,and the irrelevant attributes in the data features may affect the performance of the algorithm,and the time complexity of the algorithm is relatively high problem.Therefore,in view of the problems of the algorithm,this topic uses weight factors,simple random sampling and other ideas to conduct research on outlier mining.The main results are as follows:(1)Outlier data mining algorithm based on weighted deep forest(WDF).Aiming at the randomness problem of subtree feature selection in deep forest,a construction method of weighted deep forest is given and applied to outlier detection.Firstly,the weight factor ? is defined by the predicted probability of the forest,which can describe the accuracy of the current layer in the forest.Secondly,the weight factor ? is used as the weight of each forest in the cascade layer,in order to reduce the influence of the random selection of the root node characteristics in the forest on the performance of the algorithm;according to the different distribution of data samples,the local isolation factor ? is redefined,which can describe the degree of data outlier.On this basis,an outlier data mining algorithm based on weighted deep forest is given.Finally,the experimental verification results show that the algorithm has higher mining quality in outlier mining compared with similar algorithms.(2)Weighted deep forest fast outlier data mining algorithm based on multiple sampling(FWDF).Aiming at the problem of large feature repeatability in deep forest algorithm using sliding window for data conversion,a fast construction method of weighted deep forest based on multiple sampling is presented and it is applied to outlier detection.First,in multi-granularity scanning,the input features are randomly selected according to the window size,and then sub-sampling is used to extract data instances,which gives a deep forest fast construction method(FDF).Second,integrate the FDF algorithm into the WDF algorithm to detect outliers.On this basis,a weighted deep forest fast outlier data mining algorithm based on multiple sampling is given.Experiments show that this algorithm has higher mining efficiency compared with WDF algorithm.(3)A prototype system for outlier data mining for stellar spectroscopy is carried out.Under the windows 7 development environment,the system uses python as the development language to realize preprocessing functions such as feature selection,missing value processing,feature weighting,as well as outlier detection of FWDF algorithm,result visualization and other functions.Finally,the use case test results show that the system is feasible,which provides an effective way to explore unknown and rare targets in a specific context.
Keywords/Search Tags:Outlier detection, Deep forest, Outlier factor, Stellar spectrum, Multiple sampling
PDF Full Text Request
Related items