Font Size: a A A

Research On Stream Mining Techniques And Applications In Telecommunications

Posted on:2012-08-11Degree:DoctorType:Dissertation
Country:ChinaCandidate:L C GuoFull Text:PDF
GTID:1118330371457842Subject:Control Science and Engineering
Abstract/Summary:PDF Full Text Request
With the rapid development of computer technology, internet technology and digital communication technology, data stream became one of the main data models in telecommunication industry. Compared with traditional data models, data streams are ordered, arriving with high rate and unbounded, could hardly be dealt with traditional static data mining techniques, which brings challenges to data mining technics and its applications in telecommunication industry.In consideration of the practical telecom business problems, a global stream mining method is used instead of the existent partial stream mining methods. The thesis proposes a novel GGACFI-MFW algorithm to find frequent itemsets over global data streams based on the max-frequency window model, a new CBMP algorithm to build online classification model for data streams with concept drift based on frequent patterns, a novel MM-FTP algorithm to monitor and predict frequency tendency of itemsets over data streams based on the Max-Min-Frequency window model and a Data Stream Management System for Telecommunication to online monitor and analyze data streams of telecommunication companies.The research issues and contributions of this thesis are listed as follows:â… . A novel GGACFI-MFW (Generating Global Approximate Closed Frequent Itemsets on Max-Frequency Window model) algorithm is proposed for mining frequent itemsets over global data streams based on the max-frequency window model. A MFP-Tree (Max-Frequency Pattern Tree) structure is designed for store summary information over the global stream, which guarantees each itemset an independent max-frequency window. An efficient Selective Updating Mechanism (SUM) is built to update the MFP-Tree while the stream flows. The efficiency and effectiveness of the proposed algorithm is illustrated by case studies in both simulate data streams and a practical web log data stream.â…¡. To cope with the classification of the data stream with concept drift, a novel CBMP (Classification Based on Max-frequency Pattern) algorithm is proposed. The summary information is stored in a CMFP-Tree (Classification on Max-Frequency Pattern Tree). The fuzzy classifier is updated online, which guarantees real time classification for streaming data. Case studies approve the high performance in both precision and efficiency, by comparing the CBMP algorithm with the CMAR algorithm, the CAPE algorithm and the CBC-DS algorithm respectively.â…¢. For the frequency tendency prediction of itemsets over streams, a novel MM-FTP (Max-Min-Frequency Tendency Prediction) algorithm is proposed. A MMFP-Tree (Max-Min-Frequency Pattern Tree) structure is established for summary information storage, based on the MFP-Tree structure. A new measure FCR (Frequency Changing Rate) is presented to describe the tendency of the itemset frequency. With a proper transformation, the MM-FTP algorithm could also been used in the problems of index tendency prediction. Case studies on web log data stream illustrate the performance of the proposed algorithm in both efficiency and effectiveness.â…£. To cope with the online analysis problems over streaming data, a novel Data Stream Management System for Telecommunication is proposed, based on previous works of a data mining methodology and a business intelligence system in telecom industry. The GGACFI-MFW algorithm, the CBMP algorithm and the MM-FTP algorithm are successfully applied in the online cross-selling analysis of telecom packages, customer arrears online prediction and customer lost online prediction, respectively.
Keywords/Search Tags:data mining, data stream, frequent itemsets, classification, tendency prediction, telecommunication
PDF Full Text Request
Related items