Font Size: a A A

Application Of Data Pre-processing Method In The Mobile Telecommunication Industry

Posted on:2011-03-21Degree:MasterType:Thesis
Country:ChinaCandidate:Y DongFull Text:PDF
GTID:2178360308473342Subject:Management Science and Engineering
Abstract/Summary:PDF Full Text Request
Numerous incomplete data, inconsistent data, noisy data and missing data exist in the real world. Therefore, it is important to pre-process data before data mining. Main research contents of data pre-processing include data cleansing, data integration, data transformation and data reduction. Various methods of data pre-processing are analyzed at first. As for the characteristic of churn data of Mobile Telecommunication Industry that there are 70 tables, 400 properties, 2000 thousands records in the data and maximum loss rate is 28.3%, More than 10 kinds of data pre-processing methods have been applied.This dissertation includes the following contents:(1)This dissertation firstly presented concept of data quality issues, and then data preprocessing methods corresponded to data quality problems are summarized.(2)As for the rate of 28.3% of missing data, this dissertation abandons the traditional practice of simply deletion. This dissertation brings forward data imputation methods. And after comparing various kinds of data imputation methods, multiple imputation algorithm is adopted. As the volume of data, adopting imputation method, is 683715, belongs to large-scale data, In order to ensure the effect of imputation, small sample experiment was adopted, after comparing the effect of imputation, the best imputation times is figured out, finally the imputation process is accomplished. The obtained dataset is complete and approximate to the reality.(3) The data is applied to Data Ming model after multiple imputation, attribute subset selection, attribute integration, property construction, data discretization, data normalization, data sampling and other methods. The obtained information was approved by the customer, suggesting the effectiveness and significance of data pre-processing.
Keywords/Search Tags:data preprocessing, missing data, multiple imputation, telecommunication data
PDF Full Text Request
Related items