Font Size: a A A

Research On Association Rule Mining Algorithm Based On Time-stamp And Vertical Format

Posted on:2020-10-20Degree:MasterType:Thesis
Country:ChinaCandidate:J J MaFull Text:PDF
GTID:2428330602986952Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
Due to the development of computer technology and the popularity of the Internet,data plays an increasingly important role in life,social production and scientific inquiry.Obtaining effective information from massive data can help us make correct decisions.The task of data mining is to mine effective information in the data.This paper studies the popular association rule algorithm in data mining,which aims at mining the hidden association between data.The proposed algorithm in this paper is an improved version of the specific latermarketed consequent mining algorithm(SLMCM).The later-marketed commodities can be extended to the new items added to the database.This algorithm allows for data updates and adapts to practical applications.The key step of SLMCM algorithm is to add time-stamp,so it is also called times-tamp based association rule algorithm.The execution efficiency of SLMCM algorithm is extremely low,which is not suitable for the current big data background.In view of this problem,this paper proposes the following improvements:(1)Proposed an improved algorithm E-SLMCM,which adopts a vertical structure and only needs one traversal of the database to converts the database into a vertical format.Because when converting the database to a vertical format,the time-stamp of each item can be recorded directly according to the time when the item first appeared and does not need to sort the items of each transaction by time-stamp as the original algorithm,which saves time.In addition,the algorithm adopts the method of set enumeration tree ascending,and the efficiency is doubled.(2)In order to improve the operation efficiency on the dense database,the DESLMCM algorithm is proposed based on the E-SLMCM algorithm with the Diffset.In this paper,the method of descending order of set enumeration tree is adopted.(3)For adapting to big data mining,E-SLMCM algorithm and DE-SLMCM algorithm proposed in this paper were parallelized,and SPE-SLMCM algorithm and SPDE-SLMCM algorithm were proposed based on the popular Spark distributed framework.Since the algorithm adopts a vertical structure,when generating candidate item sets,the combination of item-sets with different prefixes is separate and does not affect each other.Therefore,multithreading can be conveniently used for the distributed processing of the algorithm,which is more suitable for big data background.
Keywords/Search Tags:data mining, Association rules, The time stamp, Vertical structure, Parallel Mining
PDF Full Text Request
Related items