Font Size: a A A

Based On The Data Stream Management Of User Behavior Mining Technology Research

Posted on:2013-09-05Degree:DoctorType:Dissertation
Country:ChinaCandidate:J LiFull Text:PDF
GTID:1228330374999637Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
Recently, network threats incur a variety of security challenges, which demands analysis of the network data streams. Network behavior data streams not only have a large number of continuous data, but also have browsing and search behavior data. Therefore, analyzing user behavior data streams calls for dealing with three challenges:the user information, the content information, and the system management. Specifically, the contribution of the dissertation is four folder:1. From the user dimension, a new user behavior prediction model is pro-posed that can categorize users into different behavior categories. Specifi-cally, this model collects user behavior data, including user web click data and search keywords, to mine the relationship between user behavior and security class label. Compared to existing user behavior analysis models, the proposed method has the following contributions:(ⅰ) It is developed to predict users’ behavior categories, and uses the probability latent seman-tic analysis to discover the tendencies of user behaviors;(ⅱ) It builds a mapping function between user tendency label and the behavior label;(ⅲ A new metric is used to measure the utility of the model. Experiments have demonstrated that the model can accurately predict the user behavior class label without labeling expense.2. From the content dimension, a fast ensemble prediction model is pro-posed. The model uses multiple classifiers to predict the class label of each incoming stream record. Despite the accurate and stable merits of the models, the prediction efficiency drops heavily with the number of base classifiers in the ensemble increasing. Therefore, we propose an en-semble indexing method that can use the shared patterns among the base classifiers in the ensemble to reduce the prediction cost. Specifically, t-wo indexing models (E-Tree and SVM-Index) are proposed to achieve the sub-linear time costs (O(logN) and O(1) respectively). Experiments on UCI data have demonstrated that the models can reduce25%and3%re-spectively of the original models.3. From the system perspective, the data mining and machine learning meth-ods are used to construct adaptive filter framework. Specifically, for sta-ble stream environment, the K-means method is used to build hierarchical sorting model KHO to improve the robustness of the sorting algorithm of the filters. On the other hand, based on the ideological level decision-making (AHP) and exponential smoothing adaptive filter, a new AHES model is proposed for unstable data streams. These two methods enable us to incorporate the context information on data streams for filter sorting. Experiments have demonstrated the utility of the models.4. Based on the three key techniques, a new data stream management engine IceStream is designed. The key modules and functions of IceStream are introduced.
Keywords/Search Tags:Data Stream Analysis, Data Stream Management, UserBehavior, Ensemble Learning, Shared Filter Ordering
PDF Full Text Request
Related items