Font Size: a A A

The Research Of High Utility Itemsets Mining Algorithm Over Data Stream

Posted on:2015-09-26Degree:MasterType:Thesis
Country:ChinaCandidate:H H MuFull Text:PDF
GTID:2298330431992564Subject:Computer software and theory
Abstract/Summary:PDF Full Text Request
With the rapid development of data storage technology,discover the potentially usefulinformation from large amounts of data has become a huge challenge. Especially with thedata stream is widely presented in various application areas, the data stream mining hasbecome a new direction in the current research field of data mining. Unlike traditional staticdatabases, data stream is continuous, unbounded, and high-speed. Frequent itemsets miningis an important research method in a data stream mining, but the traditional frequent itemsetmining measured the importance of itemsets with support. It will lose some infrequent buthigh utility; users are more interested in the itemsets in the mining process, so mining highutility itemsets over data stream becomes an important research topic. Meanwhile, theexisting algorithms produce a large number of candidate itemsets in the mining process andthis masks it difficult for the users to filter useful message among the huge set of patterns, Toaddress this situation, this paper analyzed the problem of mining high utility itemsets overdata stream.Firstly, describe the technology of data stream mining and the method of mining highutility itemsets, Summarize and elaborate on the existing mining high utility itemsetsalgorithms from the aspect of data structure and the processing methods. Give the problemsof current mining high utility itemsets algorithms over data stream, thus made the study ofthis paper.This paper presents an algorithm for mining high utility itemsets over data stream namedHUIDE (High Utility Itemsets over Data Stream), make up for the lack of the traditionalhigh utility itemsets mining algorithms to meet needs of the user’s decision-making better.The algorithm proposes an effective measure of utility metrics considering the concept ofdata information and the requirements of user’s itemsets utility (profit).This method set highutility itemsets not only considering the support of itemsets, and more emphasis on thepractical utility of itemsets, Then, describes the distribution of data more accurate using atime-based sliding window over data stream and constructs a tree structure, calledHUI-tree(High Utility Itemsets tree), each node in the tree using the weighted utility storedin descending order of effectiveness,making the tree more compact and reducing thecandidate set effectively; Lastly, traverses the entire tree structure using the bottom-up strategy and mining high utility itemsets. This algorithm reduces the time and memoryconsumption by scanning a database for mining results. The experimental results in artificialand real world data sets show that this algorithm mining high utility itemsets effectively.
Keywords/Search Tags:high utility, data stream, utility metrics, tree structure
PDF Full Text Request
Related items