Font Size: a A A

Research On Decision Algorithm Of E-commerce System Based On Weka

Posted on:2016-03-13Degree:MasterType:Thesis
Country:ChinaCandidate:Y WuFull Text:PDF
GTID:2348330476955300Subject:Information and Communication Engineering
Abstract/Summary:PDF Full Text Request
With the continuous development of information technology in modern society, the Internet has become a mainstream media of information transmission platform. As with the expanding scale, the scope and depth of the database application, data mining work is particularly important. The data mining technology contains machine learning, pattern recognition, statistics, database technology, and it's widely used in electronic commerce system, bank credit system, the insurance industry, telecommunication system, medical system and other aspects. The decision tree classification algorithm is more commonly used in data mining algorithm system and it's also considered as the classic algorithm.There are many kinds of algorithms in decision tree classification algorithm system, including the very classic ID3, C4.5 and CART algorithm. In the current situation, to promote the efficiency of C4.5 algorithm, is a very important research subject. By learning the decision tree classification algorithm in this paper, we do study on C4.5V1 algorithm research. Firstly, we do fully analyse the C4.5 algorithm of functions and system framework of data mining platform—WEKA, make the test work on the classic algorithm using the training data in the platform, make the comparison and analysis on the algorithm performance. Secondly, to conduct the thorough research about the C4.5 algorithm, design the system module of the algorithm,and realize the code, integrate them into the Weka platform. By using the online test data sets for data mining, we do the comparison of the calssical C4.5 and C4.5V1 algorithm. The C4.5V1 algorithm is verified well integrated into Weka platform, and the performance is superior to the classic C4.5 algorithm. But on the other side, the C4.5V1 algorithm takes more time to build model. Then, in this paper, based on C4.5V1 algorithm, we put forward the improved C4.5V2 algorithm and C4.5V3 algorithm. They can improve the classification accuracy of the algorithm system and reduce the modeling time of the algorithm. The C4.5V2 algorithm improves the classification accuracy of the algorithm by introducing the concept of redundancy between attributes, in order to weak the classification of other attributes of the current impact. But at the same time, C4.5V2 algorithm needs more time to build system model, as the improvement of its accuracy is at the expense of the modeling time. The improved C4.5V3 algorithm can solve this problem well. By make the large number of logarithmic formula into a arithmetic. Then make the calculation process of algorithm system more simple, thus greatly do compression for algorithm modeling time. Finally, we do the implementation code of the improved C4.5V2 algorithm and C4.5V3 algorithm, make them integrated into WEKA platform, carry on the algorithm performance test, at the same time, applied to the electrical system of the data set. Using C4.5 algorithm and improved versions, the classification accuracy and the algorithm detailed comparison between two aspects of modeling time experiments prove that the proposed algorithm is improved in accuracy and time complexity.
Keywords/Search Tags:Data mining, decision tree, C4.5 algorithm, WEKA, classification algorithm
PDF Full Text Request
Related items