Font Size: a A A

Utility Mining Technologies And Its Applications

Posted on:2021-05-27Degree:DoctorType:Dissertation
Country:ChinaCandidate:W S GanFull Text:PDF
GTID:1368330614450948Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
In the era of digital economy,data is usually complex,large-scale,and its type is rich.How to discover the "utility characteristics" contained in data is a key and challenging issue in the field of data science.Utility-driven data mining has a wide range of applications and needs than traditional data mining,and it has a great significance for theoretical research and engineering application.In the era of big data,the theories and technologies of utility-driven pattern mining are the frontier research topics in the field of data mining.Based on utility-driven mining theories and technologies,related research has an important scientific significance for sociology,economics,computer science,data mining,and databases.There are many application prospects in the fields of market basket analysis,risk analysis and prediction,behavior analysis,and recommendation systems.Utility mining has received extensive attention and studies,but there are still many key technologies and difficulties need to be further studied.At present,existing problems include: First,the measurement criteria for utility-based patterns is limited.How to define the utility function of desired patterns and improve the usability of utility mining results,and how to make the obtained results better? This is a basic scientific issue.Second,the flexibility of processed data type in utility mining is limited,thus the application is insufficient.Most of the existing utility mining models and algorithms are developed to deal with a wide variety of transaction data,in part for sequence data.Third,the theories and techniques of utility mining research is insufficient.How to define a generalized utility mining model for different types of data,how to define the utility computing model,and how to propose a generalized upper bound on utility value are the important scientific problems.To this end,this paper mainly carries out the following research work to further expand the connotation and extension of utility mining,as described below:At the level of transaction data,there is a shortage of utility measurement for utility mining,thus this paper proposes a new measurement criterion called utility occupancy and develops the high utility occupancy pattern mining(HUOPM)algorithm.This algorithm proposes two highly compressed data structures: Utility-Occupancy list(UO-list)and Frequency-Utility table(FU-table),which are used to store the useful information from transaction data,including frequency and utility information.In addition,the concept of remaining utility occupancy helps to calculate the upper bound on utility occupancy,andthus reduces the actual search space.Based on the proposed several pruning strategies,the HUOPM algorithm only needs to scan the database twice,then directly constructs the UO-list,and then directly extracts the results from the frequency-utility tree.The HUOPM algorithm can not only successfully solve the new research problem of mining high utility occupancy patterns(HUOPs)from transaction data,but also ensure that the mining results are complete and the mining performance is highly efficient.At the level of sequence data,this paper proposes a compact data structure namely utility-array based on the sequence data,for addressing the problem of poor mining performance and memory costly in high-utility sequential pattern mining.The key information in the sequence data(e.g.,utility of the sequence,remaining utility,location,sequence order,etc.)can be stored in utility-array.By using the projection mechanism,the projection-based utility mining(Pro UM)algorithm can quickly construct the utilityarrays corresponding to the extended sequences of a certain sequence.It can avoid the time-consuming operations which are the common tricks in the previous algorithms: first constructs the projected sequence database and then scans it.This paper also proposes a new upper bound namely sequence extension utility,which can be used to prune the search space and guarantee the completeness of final results.Therefore,Pro UM can filter out a large number of undesired sequence patterns early and quickly return a set of high-utility sequential patterns during the mining process.Extensive experimental results show that the Pro UM algorithm is significantly better than the state-of-the-art high-utility sequential pattern mining algorithms,such as the USpan algorithm and the HUS-Span algorithm,in terms of less running time,less memory consumption and better scalability.At the level of complex event sequences,due to the problems of poor mining performance and incomplete results in existing high-utility episode mining algorithms,this paper proposes the UMEpi(Utility Mining of high-utility Episodes)algorithm to discover the complete and accurate high-utility episodes from complex event sequences.This paper first proposes the concept of episode-based remaining utility and the correct concept of Episode-Weighted Utilization(EWU),and then proposes a high-utility episode mining algorithm based on the EWU strategy.In addition,two optimized filtering strategies are further proposed,which greatly improve the performance of episode mining based on the prefix-based extension mechanism.The relevant experimental results show that the UMEpi algorithm can successfully solve the problems of current high-utility episode mining algorithms that lack of the correct overestimated upper bound value and effectivepruning strategy in the search space.It not only effectively ensures the completeness and correctness of high-utility episode mining,but also has a good scalability when dealing with long or dense event sequences.At the evaluation level of mining results,it introduces a new mining problem based on correlation with null-transaction invariant property,and proposes two algorithms namely Co HUIM and Co UPM.They are based on different mining mechanisms.How to evaluate the results of utility mining,and how to make the results of utility mining better and more practical are the key issues of utility mining.By measuring the correlation factor,the two proposed algorithms are not only efficient,but also have a high positive correlation among the discovered high-utility patterns,which can bring practical effect for pattern mining.The Co HUIM algorithm is based on the projection mechanism and the sorted downward closure property of the Kulc metric;the Co UPM algorithm is based on the revised utilitylist structure,and its mining performance is better than the level-wise Co HUIM algorithm.A large number of experiments have shown that the correlation-based high-utility patterns are more relevant than the purely high-utility patterns that exploited by the previous algorithm.These discovered patterns are more practical for recommendation and crossselling.The two algorithms study how to discover relevant high-utility itemsets from transaction data.The relevant research theories and techniques can be extended to other utility mining branches for dealing with other types of data(e.g.,sequential data,event sequence),such as mining high-utility episodes with high correlation.
Keywords/Search Tags:data mining, utility mining, high-utility pattern, sequential pattern, highutility episode
PDF Full Text Request
Related items