Font Size: a A A

A Learned Dynamic Cuckoo Filter For Approximate Membership Queries Over Variable-sized Sliding Windows On Data Streams

Posted on:2024-04-15Degree:MasterType:Thesis
Country:ChinaCandidate:T Y YanFull Text:PDF
GTID:2568307067973249Subject:Computer technology
Abstract/Summary:PDF Full Text Request
With the rapid development of mobile Internet and the Io T,massive amounts of data are generated every minute.Including browsing records on various mobile applications,usage records of smart homes,and traffic logs of network nodes,among others.This massive data flow contains valuable information,such as user behavior patterns,recent preference trends,and hot topics of interest.In addition to analyzing users,smart device manufacturers can continuously collect sensor data to improve their products,website owners can continuously monitor homepage traffic to prevent malicious attacks,and end devices can analyze logs to optimize routing.Data mining of data streams can bring convenience to various scenarios and create more value.Data analysis and mining of data streams have become an urgent need in more and more application scenarios.However,data streams have the characteristics of unboundedness and ordering.As long as the devices generating data are not down,new data will continue to be generated.Moreover,data stream analysis has strong real-time requirements,and recently generated data often has higher value.However,existing storage devices do not support recording all data in all scenarios.Data analysis of data streams is generally query-oriented,including whether a certain record exists,how many times a certain data appears,or which are the most frequently occurring records.The most common query is the membership query,which determines whether a data element exists.Combining the real-time characteristics of data stream applications,it is urgent to have a data structure that can accurately answer whether a certain record exists within a certain time period,so as to achieve better and faster analysis of data streams.At the same time,it is necessary to support queries for different time periods to adapt to complex application scenarios.Existing research and algorithms have not solved the above problems,so this paper proposes the following research content:(1)This paper proposes a new learned dynamic Cuckoo Filter structure that can effectively support the insertion of an infinite number of data elements,and can answer queries about the existence of data elements with extremely low time and space overhead and controllable error rates for any query window size.(2)This paper rigorously analyzes and proves the proposed structure,including the error rates and parameter settings and analysis of the structure under various usage scenarios.The effectiveness and correctness of the structure are rigorously proved by theoretical analysis.(3)This paper conducts comprehensive comparative experiments on two real-world datasets and two synthetic datasets with different distributions and existing state-of-the-art algorithms in two different scenarios.The experimental results show that the proposed structure can effectively solve the above problems and is optimal in terms of time and space consumption as well as error rate indicators compared to existing algorithms.The correctness of the theoretical analysis is also proved.
Keywords/Search Tags:Cuckoo Filter, Learned Index, Probabilistic Data Structures, Data mining, Data Stream Applications
PDF Full Text Request
Related items