Font Size: a A A

CHAID Algorithm Parallelization And Application In Credit Risk Analysis

Posted on:2017-01-10Degree:MasterType:Thesis
Country:ChinaCandidate:Y X YangFull Text:PDF
GTID:2308330503979768Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
On the basis of the development of cloud computing in recent years, research on the big data and processing platform become more important. Parallel computing model called MapReduce which comes from Hadoop in 2005 has been a research based model of big data. However, because Spark platform which is from open source counterpart officially in 2010 has two advantages, good interactive and iterative calculation, it exceeded the processing speed of Hadoop and is expected to become another useful big data processing tool. Data mining is one of the core modules of big data processing, the processing speed requirement is very high, and Spark can meet the requirements completely. At present, the national economy is developing fast, and the financial industry is an indicator of national economy, it is important to research on credit risk, which is one of three major risks of commercial bank.The implement of the classification algorithm on Spark platform are not very more. The article will research on CHAID parallelization and application of the algorithm on Spark platform. Firstly, the Spark platform is analyzed in detail, and classification algorithms in data mining were compared and summarized at the same time. Then the classification algorithm of CHAID algorithm is improved, and FCHAID algorithm is put forward, the classification effect is improved because the interaction between the independent variables of FCHAID is relatively fair. The Logistic regression model is combined in order to get a better model. FCHAID algorithm was used in data parallel on Spark platform, and the performance of the single machine processing and Spark parallel processing are compared. Finally, FCHAID algorithm was applied to public German credit data which is from UCI. According to FCHAID algorithm, scoring model was set up to provide the scientific basis to the bank, to reduce the loss of credit risk by analyzing the bank customer credit behavior.
Keywords/Search Tags:Spark, FCHAID algorithm, big data, data mining, credit risk models, parallelization
PDF Full Text Request
Related items