Research And Improvement The Algorithm Of Mining Frequent Item Sets In Text Association Analysis

Posted on:2009-02-26

Degree:Master

Type:Thesis

Country:China

Candidate:F Hao

Full Text:PDF

GTID:2178360245465690

Subject:Computer software and theory

Abstract/Summary:

PDF Full Text Request

In the information era, a great deal of data are brought to us, it already become info technique field's hot issue that how to help people collect and select interested information, and discovery underlying, useful knowledge in increasingly information. In this situation, data mining and knowledge discovery in databases emerge as the times require. Text association analysis that finds connection between different words from document marshal is a important task in the area of text mining. Majority methods use the association rule of normal data mining field.First, this paper researches the characteristics of text association analysis which based on keywords, it's just like the conventional association analysis. We can regard text as affair, keywords as items, thus the keywords association analysis of text database transform the normal database association analysis. But because of the high dimension and sparsity character, using the same min support threshold on different text database will lead frequent item's size having huge discrepancy. So enactment support threshold become a difficulty of text association analysis.Second, this paper researches the algorithm of mining N most frequent item sets-IntvMatrix. This algorithm use the strategy of dynamic adjusting support threshold, thereby we can control the dimensions of frequent item sets by inputting the number N. It' s defect is that structure inverse matrix can bring on space wasting, and building affiliation between items needs scan database many times, it will bring on the waste of time.Third, aiming at the problems of IntvMatrix, this paper advance a kind of algorithm which called mining N most frequent item sets based improved FP-Tree, it arrange the order of items and the whole database, meanwhile delete the non frequent items, thus it can reduce the time when we search share prefixal, then it construct the COFI-Tree of local frequent items based on FP-Tree. This algorithm still use the strategy of dynamic adjusting support threshold, it makes guarantee on technology of producing N most frequent item sets.Finally, we input the different number N of frequent item sets on the same text database, and compare the new algorithm with IntvMatrix. The results show that the new algorithm's time and space using quotiety are improved, because adopt ameliorative FP-Tree to structure local COFI-Tree, along with optimized data structure.

Keywords/Search Tags:

data mining, text data mining, text association analysis, N most frequent item sets, FP-Tree, COFI-Tree

PDF Full Text Request

Related items

1	Research On Mining Algorithms Of Maximal Frequent Item Sets
2	Research On Frequent Closed Pattern Mining Algorithm Based On COFI-Tree
3	Research On Algorithm Of Mining Association Rules Based On FP Tree
4	Research On Algorithm Of Mining Association Rules Based On Fp Tree
5	Studies On Algorithms Of Association Rule Mining In Data Mining
6	Research And Application Of Association Rull Mining Algorithm In The Data Mining
7	Search Of Algorithms For Mining Maximum Frequent Item-sets
8	Study On Association Rules Mining Algorithm Based On FP-tree
9	Research And Application Of Association Rules Mining Based On FP-tree
10	The Research On The Related Problems Of Association Rule Mining