Font Size: a A A

Information Theoretic Data Mining

Posted on:2020-02-17Degree:MasterType:Thesis
Country:ChinaCandidate:Y F YangFull Text:PDF
GTID:2428330620960049Subject:Information and Communication Engineering
Abstract/Summary:PDF Full Text Request
With the development and advancement of the technologies such as wireless mobile communications,the Internet,and various smart terminal devices,massive data is continuously generated and collected exponentially.How to process and analyze such massive data and how to discover and extract useful or valuable knowledge or information from it are some worth considering issues.These issues all involve a key technology in big data technology,that is data mining.There are some relations between data mining and information theory.Some researchers try to understand and solve some(big)data mining problems from a novel perspective of information and communication theory.In recent years,Some related work has shown that information theory can provide some methodologies and strategies for data mining,and these methods can obtain quite good results,meanwhile,they are suitable for large data sets and are highly interpretable.Based on these,this paper proposes two data mining methods by information theory: adaptive equalizer time series analysis model and J divergence decision tree classification algorithm.The adaptive equalizer time series analysis model assumes that there exists information flow between the target time series and its related time series.Therefore,an single-input equalizer is established between the related time series and the target time series to predict and estimate the target time series.In the process of building the model,the equalizer is trained to obtain the optimal equalizer length and tap coefficients of equalizer.Keeping the optimal equalizer length and updating the equalizer tap coefficients online by the gradient descent method,the final prediction of the target time series can be achieved.Then,the historical target time series is introduced as an additional input to form a complete multi-input equalizer time series analysis model.The experimental results show that the single-input equalizer time series analysis model can reflect the correlation of time series to a certain extent,and the complete multi-input equalizer time series analysis model outperforms existing time series analysis models with external inputs in terms of RMSE performance.The J divergence decision tree classification algorithm inherits the advantages of clear,concise and interpretable decision tree classification algorithms.In this paper,according to the characteristics of the influence of J divergence tending to expand the probability of zero-value,a new partitioning criterion for sample set partitioning and decision tree splitting is proposed,and the complete J divergence decision tree classification algorithm is further proposed based on it.J divergence decision trees tend to choose feature that makes the number of certain category in subset as small as possible or zero in each split,thus its structure is distinct from the existing CART classification decision tree.Without the limit of maximum depth of the decision tree,the J divergence decision tree has lower classification error rate than the existing CART classification decision tree.The final experimental results show that the J divergence decision tree classification algorithm is more suitable for complex large data sets with more categorical features than other classification algorithms.Based on the information(communication)theory,this paper proposes two methods for different data mining tasks,which perform well in their respective problems.Therefore,the research work in this paper further shows that information communication theory can provide some methodologies and strategies for(big)data mining analysis.
Keywords/Search Tags:Big Data, information theory, data mining, adaptive equalizer, time series analysis, J-Divergence, decision tree classification
PDF Full Text Request
Related items