Font Size: a A A

Study On Frequent Data Mining From Uncertain Data Streams

Posted on:2013-05-08Degree:DoctorType:Dissertation
Country:ChinaCandidate:K M TangFull Text:PDF
GTID:1268330422952660Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
With the rapid development of computer and communication technology and wide application ofwireless sensor networks, Web service and RFID, uncertain data management has gained a lot ofattention. Uncertain data management plays an important role in many practical scenarios, such aseconomy situation prediction, financial information analysis, ecological environment observing,network security monitoring, logistics management, etc. But the traditional data managementtechnology cannot handle such new type of uncertain data streams effectively. Therefore, designingnew management technology for uncertain data streams draws significant interest from industry andacademia. Thus, the data mining on the uncertain data streams has already become a research hotpot.The data mining on the uncertain data streams mainly focus on the clustering, frequent patternmining, skyline query, data lineage analysis, outlier analysis. Based on the thorough understanding ofrelated work, this paper discusses the state-of-the-art of the frequent data mining of the uncertain datastreams. As the basis of several mining tasks such as association rule, classification and clustering,frequent data mining had an important position in the uncertain data streams mining. Therefore, thispaper investigates the frequent data mining on the uncertain data streams deeply and proposeseffective algorithms for frequent data mining algorithms. The main contributions of the paper are asfollows:(1) We propose a sliding window-based algorithm SWBUFIM for frequent item query on theuncertain data streams. According to the characteristics of frequent item and Markov’s inequality, wegave two prunning rules to omit the items which cannot become frequent on the uncertain datastreams. We employ the dynamic programming method to compute expected probability intime. Since different data items are mutually independent, we present the model which will open subsliding window for different data item and handle the frequent item mining problem by the processingdivisions according to the number of combinations. In addition to using the dynamic programmingmethod, we improve the probability computing algorithm to efficiently compute the expectedprobability in time by only processing k-1rows in the sliding window.(2) We propose a sliding window-based top-k query algorithm MPTopKTS for the uncertain datastreams.According to the characteristics of uncertain data streams along with its sliding window, weinvestigate the top-k query problem and propose the query algorithm for the Top-k tuple sets with therelatively high score value and maximal probability of occurrence. To reduce the time complexity oftop-k query algorithm, this algorithm builds synopsis tables based on sliding window. We alsoadvanced a effective method to update the the synopsis tables in each time step. This algorithm canalso balance between the query accuracy and the time cost, namely, it can gain the high qualityapproximate result at the price of minimal computing overhead. The experiment results demonstratethat our query algorithm is more efficient than the previous work in time and space complexity.(3) We propose a sample mining algorithm MFCIFUDS for detecting the frequent closeditemsets in uncertain data streams based on sliding window.Based on the character of uncertain datastreams frequent closed itemsets, we first transfer the transactions comprised by uncertain data intothe one with certain data by random sampling, and then detect the frequent closed itemsets byemploying the mining technology for the certain data streams. We theoretically prove the feasibility of the algorithm based on the sample technology. In addition, we also propose an improved algorithm forconstructing and updating the frequent pattern tree to speed up the frequent closed itemsets mining.The experiment results demonstrate that our query algorithm, MFCIFUDS has high mining accuracyand proceeding speed.(4) We propose a frequent quantitative interval pattern mining algorithm MFIPatFUS foruncertain data streams based on sliding window. Instead of the regular binary item set transactionuncertain data streams, in the quantitative interval transaction uncertain data streams value of eachtransaction attribute is a quantitative interval. Here, the quantitative interval’s uncertainty is reflectedby the fluctuation of its range, and its distribution demonstrate some type of distribution probability.Utilizing the basic idea of uncertain data streams frequent pattern mining algorithm based on regularfrequent pattern spanning tree, we present an algorithm based on a frequent quantitative intervalpattern spanning tree FIPatTree which is used to capture the quantitative interval information in all ofthe uncertain data streams transactions. FIPatTree uses the boundary values of the initial interval asbase elements, and then establishes quantitative intervals according to the base element distribution.On the other hand, the probability of each base quantitative interval can be computed by theproportion of base numerical range in the original numerical range. Algorithm MFIPatFUS employsthe sliding window model and utilizes FIPatTree as the synoptic data structure. The transactionattributes are stored in FIPatTree where each node represents a base quantitative interval. Theconstruction of tree is similar to that of the regular frequent pattern spanning tree with the exceptionthat nodes can share only when base quantitative intervals and occurrence probabilities of theattributes are identical. We set frequency and partial probability statistics for the shared nodes. Wealso set indexes of the attributes and the base quantitative intervals on the FIPatTree so as to updateand traverse on the FIPatTree. The experiment results demonstrate that the sliding window modelbased mining algorithm MFIPatFUS can effectively detect the frequent quantitative interval patternmining for the uncertain data streams with quantitative interval transactions.
Keywords/Search Tags:uncertain data streams, frequent item queries, top-k queries, frequent closed itemsetmining, frequent quantitative interval pattern mining, sliding window model, samplemining model
PDF Full Text Request
Related items