Font Size: a A A

Research And Parallel Processing Of Top-k High Utility Pattern Mining Algorithm Based On Projection Table Structure

Posted on:2018-05-15Degree:MasterType:Thesis
Country:ChinaCandidate:Q WuFull Text:PDF
GTID:2348330518982365Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
The popularity and rapid development of computer and Internet technology make the data generation, collection, storage increasingly convenient, so the amount of data is exploding. However, beacuse of the overload of information, people often feel confused with the massive data. Therefore, frequent pattern mining is proposed to find out the internal relationship between things. At the same time, frequent pattern mining is widely used in commodity remommendation, disease diagnosis, instrusion detection, etc.However, frequent pattern only focuses on the frequency of the pattern appearing in the transactional database, but ignores the weight values of the items that make up the pattern.Hence, high utility pattern mining algorithm is proposed. High utility pattern mining takes into account the relationship between the weight information and the frequency of the item, which has higher practical significance.However, users need to set minimum utility threshold before mining high utility patterns, and the setting of minimum utility threshold depends on the user's experience.For inexperienced users, inappropriate threshold settings make the results of the mining vary widely. In addition, In practical applications, people tend to pay attention to the highest value of the top k patterns. Therefore, the Top-k high utility pattern mining algorithm is proposed. In Top-k high utility pattern mining, people only need to set the value of k, to avoid experience play a major role during the process of setting threshold,which reduce the difficulty in the application of high utility pattern mining.However, at present, there are some problems in the Top-k high utility pattern mining algorithm, such as slow rise of temporary utility threshold, poor time performance and worse scalability. In view of these shortcomings, this paper proposes a Top-k high utility pattern mining algorithm based on projection to solve these problems.At the same time, based on the problem of low efficiency of single machine mode mining under massive data, a distributed solution of Top-k high utility pattern mining based on MapReduce is proposed. The main work of this paper is as follows:1. A Top-k high utility pattern mining algorithm based on projection is proposed,namely TKHUP. TKHUP is a one-phase Top-k high utility pattern mining algorithm. By adopting the projection structure, TKHUP can directly get the exact utility of patterns and quickly raise the temporary utility threshold, so as to effectively dig out the specified number of high utility patterns.2. A distributed Top-k high utility pattern mining algorithm based on MapReduce is proposed, namely TKHUP-MaR. In this paper, we study and use the MapReduce parallel technology to implement the method of mining Top-k high utility pattern in large data.We implement the parallel algorithm in three stages: parallel computing, parallel establishing storage structure, and parallel mining.3. Five strategies are designed to improve the efficiency of the algorithm. The strategy CSD can greatly merge the same projection structure as the prefix pattern, thus saving more memory space. The strategy QPPR is convenient to speed up the building of projection structure by the sum of digital prefix item values that has the ability to quickly.compare prefix patterns to check whether it is same or not. The strategy DS preferentially mines the base pattern with higher utility, increasing the growth rate of the temporary utility threshold. The strategy DFP adopts the depth first mining approach to construct the sub projection structure iteratively, which can rapidly improve the temporary utility threshold value. The strategy DPUP uses the transaction-weighted downward closure property to prune the low utility patterns in projection, which speed up the mining process.4. The experimental results show that the TKHUP algorithm has excellent performance in terms of running time and memory space. In addition, the feasibility and scalability of the TKHUP-MaR algorithm are verified by the experimental results under the Hadoop platform.
Keywords/Search Tags:Utility pattern, Top-k high utility pattern, Pattern mining, Parallel, MapReduce
PDF Full Text Request
Related items