Font Size: a A A

Research On Decision Tree Algorithm For Uncertain Data

Posted on:2019-12-30Degree:MasterType:Thesis
Country:ChinaCandidate:X D QinFull Text:PDF
GTID:2518306047961369Subject:Control theory and control engineering
Abstract/Summary:PDF Full Text Request
Large quantities of datasets are collected from all walks of life with the rapid development of information technology.Mining the potential knowledge which is of great value and analyzing the large quantities of datasets is of great importance.Data mining is the technology which develops rapidly in the needs of this new era.Classification is one of the vital ramification in data mining domain which can provide great decisionsupporting for various professions.And decision tree algorithms are of vital importance in the domains of classification.However,many datasets acquired by various profession have the characteristics of uncertainty.Classical decision tree algorithms for certain datasets are not capable of mining information from data with uncertainty.Which leads to the question on how to expand decision tree algorithms for certain datasets to decision tree algorithms for uncertain data becomes one of the hottest research areas in data mining aspects.This thesis mainly focus on decision tree algorithms for uncertain data.One of the indispensable procedure before the process of classifying datasets is the pretreatment of datasets.A mathematical model which is based on probability distribution function is built to describe uncertain datasets and its characteristics.In the premise of this mathematical model the processes of the pretreatments for uncertain datasets are built to be in harmony with algorithms for classification.The processes which contain data filtration,removing outliers,distribution standardized and correction of measurement errors are to make the datasets suitable for decision tree for uncertain datasets.In the construction process of decision tree algorithms,the selection of attributes and the calculation of splits are the core works in this thesis.In the process of selecting attributes,a visualization-based attribute selection algorithm is proposed by joining the classical attribute selection method and the technology of visualization.And a novel method of calculating the split is proposed by considering the datasets as a whole in the premise of statistics which mainly focus on the mean and the variance rather than running extra treatment for continuous datasets in classical datasets.In the process of calculating the splits different methods are proposed respectively for the probability distribution function datasets are overlapped or not to make the decision tree algorithms proposed in this thesis are adapted to uncertain datasets with different characteristics.The simulation verify the viability of the visualization-based decision tree for uncertain data algorithm.This algorithm can handle continuous datasets directly and is robust for datasets with missing values.
Keywords/Search Tags:data mining, classification, decision tree algorithm, uncertain data, visualization, probability distribution function
PDF Full Text Request
Related items