Font Size: a A A

Mining High Average Utility Sequential Patterns From Uncertain Databases

Posted on:2017-09-09Degree:MasterType:Thesis
Country:ChinaCandidate:T LiFull Text:PDF
GTID:2348330536481729Subject:Computer technology
Abstract/Summary:PDF Full Text Request
In recent years,data mining is a critical issue especially in big data era since it can provide the implicit information for decision-making.High utility sequential pattern mining(HUSPM)considers both timestamp,internal and external utility factors to mine the high-utility sequential patterns(HUSPs),which is an arising research topic in recent decades.The HUSPM does not,however,take the length of patterns into account,thus the utility of the discovered HUSP increases along with the size(number of items)within it.Previous works of average-utility itemset mining(HAUIM)was introduced to provide a better measure for utility-mining framework.Besides,the above approaches can only be used to solve the precisely data,in which an item is represented as 1 either 0 in the databases.In real-world applications,data may be collected with the uncertainty degree since they may be collected from various sensors in the heterogeneous environment.Most previous works could not handle the uncertain data for deriving the required infor mation.In the first part of this thesis,a new structure called average-utility(AU)-list structure is presented to efficiently mine the high average-utility itemsets(HAUIs).A depth-first search algorithm named HAUI-Miner is proposed to explore the search space,and an efficient pruning strategy is developed to reduce the search space and speed up the mining process.In the second part of this thesis,we analyze the relationship between uncertainty and utility.Instead of handling precise data,a new framework called potential high utility sequential patterns mining(PHUSPM)is presented to process uncertain data.An upper-bound-based algorithm named PHUSPM-UP and the projection-based algorithm named pre-PHUSPM are further developed to mine HUSPs from uncertain databases.Experiments show that the later one outperforms the baseline PHUSPM-UP algorithm.Those two algorithms provide a new direction of HUSPM and the scope of HUSPM can be further extended.In the third part of this thesis,the average-utility measure is thus considered in HUSPM.As what I mentioned,the utility of a pattern increases along with the length of it.Two algorithms named MUHAUSP and Prefix MUHAUSP are respectively presented to mine potential high average-utility sequential patterns from uncertain databases.Extensive experiments carried on both real-life and synthetic databases showed that the mining performance of the designed models in terms of runtime,number of candidates and memory usage,are significantly.Thus,this thesis explores the solutions for considering utility,length of pattern,and the data uncertainty to respectively mine the required information.The designed algorithms in this thesis provide a new research direction and several critical issues can be studied and explored as the future works.
Keywords/Search Tags:data mining, high average-utility, sequential pattern, uncertain data, projection
PDF Full Text Request
Related items