Font Size: a A A

Research On Similarity Query And Pattern Mining Algorithms Over Data Stream

Posted on:2017-02-12Degree:DoctorType:Dissertation
Country:ChinaCandidate:S P WangFull Text:PDF
GTID:1318330542477135Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
In recent years,with the fast development of wireless communication techniques,embedded computing techniques,and microelectronics computing techniques,Wireless Sensor Networks(WSN)are widely used in the field of healthy care,industry monitoring,military defense and etc.The development of IOT makes this phenomenon become more obvious.WSN is a data-centric network.The data in it is rapid,infinite,and various,which is the characteristics of the data stream.Obtaining knowledge in this kind of data is very important for our effective response based on status of the perceived region.The traditional data mining technology generally takes the static data which is stored in the storage medium as the processed object,and is hardly applicated to data stream scenes.So,it is very important to do research on the data mining and data analysis technology over data stream,and is the key problem in the researches related to WSN.Data Stream Management System is a data management platform which is built for the sake of management and application development about data stream.As the expansion of application scope on the sensor network,the source,the quantity of data stream and the applications related to the knowledge implied in it are increasing.So it is very necessary to integrate data stream mining algorithms in the data stream management system to provide users with callable mining services.Based on the conclusion and analysis of existing researches at home and aboard,this dissertation conducts an in-depth study which focuses on two important issues of the data stream mining,similarity query and pattern mining in order to serve the data stream management system,.The main contributions of this dissertation are concluded as follows:(1)In consideration of the processing problem about precise Disjoint query based on sliding window,which is the subsequence similarity query over data stream under DTW,a typical algorithm which has characteristic of incremental computation,called DQPIC,is then presented.The algorithm takes the current FSM algorithm to get query results within the first window.Beginning from the second window,it obtains the query results within every window based on the ones in the adjacent last window,and can omit the processing costs related to many data stream elements in each window.The simulation results based on the common data sample SST and Maskedchirp show that the DQPIC has the same results as the current algorithms,and can improve the time efficiency by 2.5?25 times at the cost of increment of space cost by 1.12?3.27 times.As data stream management system is usually deployed in the gateway or cloud which has sufficient space resource,the DQPIC algorithm has obvious advantage in time,and can be used in data stream management system.(2)In consideration of the processing problem about the whole sequence similarity query over data stream under LCSS,a processing algorithm D2S-PC is presented.The algorithm defines the PS and CC domains of the dynamic matrix over every window,and can omit the redundant operations related to computations about many unnecessary elements in the matrix by utilizing characteristics of the similarity query and matrix members in these two domains effectively.The simulation results show that D2S-PC algorithm has the same results as the current algorithms,especially when handling the similarity query with high accuracy requirements,it has better time efficiency and the nearly same space efficiency.(3)In consideration of the problem that the current definition of the weighted maximal frequent pattern cannot describe the weighted maximal frequent pattern with different frequent threshold and weighted frequent threshold,the concept of full weighted maximal frequent pattern is proposed,and the FWMFP-SW algorithm which is used to handle the FWMFP over data stream mining based on the sliding window is also presented.The algorithm reduces the amount of calls on the MaxW optimization policy and reconstruction on the WMFP-SW-tree in the BM algorithm effectively by introducing the optimization strategy based on the frequent constraint condition in the BM algorithm,and taking the edit distance ratio as the judge function of the WMFP-SW-tree.The experimental results show that the FWMFP-SW algorithm is effective,comparing with the current algorithms,it has better time efficiency and the same space cost,and the mechanism of judgment on the WMFP-SW-tree reconstruction in the algorithm has the best effect when the reconstruction threshold ?=0.2.(4)There are few current researches focusing the regular pattern mining over data stream.The maximal regular pattern over data stream mining based on the landmark window is studied for the first time,and the DSMRM-BLW algorithm which is used to mine this pattern is presented.The algorithm is obtained based on the HA-C algorithm,and has characteristic of incremental computation by using the boundary landmark window technology.It gets the maximal regular patterns within the first window by the HA-C algorithm.Beginning from the second window,the mining results within every window can be obtained based on the ones within the adjacent last window,and the need for complete remaining in each window is eliminated.The simulation results show that the DSMRM-BLW algorithm has the same results as the HA-C algorithm,but better time and space efficiency.
Keywords/Search Tags:data stream, similarity query, DTW, LCSS, frequent pattern mining, regular pattern mining, sliding window, landmark window
PDF Full Text Request
Related items