Font Size: a A A

Research And Applications Of Improved FP_Growth Algorithm Based On E-government

Posted on:2015-06-01Degree:MasterType:Thesis
Country:ChinaCandidate:X RanFull Text:PDF
GTID:2308330482956949Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
With the rapid development of information technology, the e-government data surges. Faced with these vast amounts of data, traditional database technology is difficult to discover useful information hidden in the data. It leads to the data only to be saved while cannot be obtained the effective use. In order to solve this problem, the data mining technology penetrates the electronic government affairs gradually. And good results have been achieved. The association rule is an important topic in the field of data mining which reveals interesting relations between item-sets. Therefore, it is widely used in business, health care, networking and communication, biology and other fields.The concept of association rule was proposed in 1993 by Agrawal with others. Based on the analysis of the supermarket shopping, they proposed the classical Apriori algorithm, arousing the domestic and foreign scholar’s interest and the research. And more algorithms also developed. One of the most widely used is FP_Growth algorithm which is from J Han et al. The algorithm does not need to generate candidate item-sets, while it has high efficiency.In this Thesis, the main work is as follows,(1) At first to analyze the FP_Growth algorithm. And three insufficiencies of the FP_Growth algorithm have been discovered. To begin with, it still needs to traverse the database two times, and also adds to the costs. In the next place, in the excavation process needs many times traversal the FP-Tree as well as the condition FP-Tree, with the low efficiency. At last, the whole process needs to traverse the item header table. However the header table takes the sequence structure in the whole algorithm. Because of the low query efficiency, it influences the efficiency of the algorithm.(2) In view of these insufficiencies, this article makes the improvements to the FP_Growth algorithm, and comes up with the new structure FP-Table, that is an improved TFP_Growth algorithm which based on FP-Table. By the way of generating the FP-Table through a two-dimensional table and then based on FP-Table mining frequent item-sets, so method only needs to scan the database one time. Finally, the efficiency of the algorithm is greatly improved. Analysis on the TFP_Growth algorithm, many invalid data have been founded, and often generates in two-dimensional tables. Therefore, it causes a waste of memory space. So this paper proposes two optimization schemes. The plan one is carries on the compression to the two-dimensional table. And after compression, the space required is only 1/2 of the original. Then the algorithm space efficiency is greatly improved. With the use of two scanning databases method, the plan two aims at the sparse data set. This way has avoided the production invalid data, and improved the handle of sparse data efficiently.(3) Then according to the characteristics of e-government data, the algorithm has been made corresponding processing. Through examples of letters and calls in the field of E-Government application process, including data selection, data processing, data mining, information application of the process. Integrated the whole process, this paper puts forward a data mining application framework which can applied in e-government system.(4) Finally indicated through the experiment, the TFP_Growth algorithm surpasses the FP_Growth algorithm in space and time efficiency all obviously.
Keywords/Search Tags:the E-government, data mining application, association rule, Apriori algorithm, FP_Growth algorithm
PDF Full Text Request
Related items