Font Size: a A A

Data Mining Algorithm Analysis And Improvement Based On Frequently Pattern

Posted on:2008-07-03Degree:MasterType:Thesis
Country:ChinaCandidate:J JiaFull Text:PDF
GTID:2178360242458820Subject:Computer applications
Abstract/Summary:PDF Full Text Request
In ages of data exploration, Data Mining plays a more and more important role. Recently, many data flow applications such as Network Log, Sensor Network and so on come out, and the mining of data flow has become a new research field and a useful tool in many application fields. The traditional Data Mining methods cannot effectively process the new model because of the infinite data, the frequent change of data and the more structural type varieties.To find the most frequence items set in Data Mining is an important problem, a basic research field, and the basis for many Data Mining methods. In its application, users need adjust the least support degree in order to find the more useful the most frequence items set.In many existed algorithms about the mining of frequence pattern, the Apriori algorithm and the FP_TREE is typical of them. The main characteristic of Apriori algorithm is its mining mechanism from the single item and trims a point every time. Employing these features, Apriori can effectively avoid the search of many impossible items. However, one problem of Apriori algorithm is its candidate items set generation. Another algorithm FP_TREE uses the divide and conquer method, compresses the information in dataset into a frequent pattern tree describing the frequent item information, and iterates to increase the auxiliary model for increase frequence pattern and database partition.Here, we make some improvements in FP_TREE algorithm and compare it with other algorithms. This algorithm saves the categorization information in data when compressing the data through the frequent model classification in data flow. The experiment implicates that this algorithm has much more accuracy than others, and also can better process the applications including many default value in training set.In sum, confronted with the data mining in large data sets, the improved FP_TREE can better solve the fusion problem of large data sets through applying the frequent model Data Mining. Moreover, not only the accuracy and the running time of data classification have experienced an improvement, but the window size also experiences a better control because of the introduction of window mechanism.
Keywords/Search Tags:Data mining, Stream Data, Frequence Pattern, FT_TREE Apriori
PDF Full Text Request
Related items