Font Size: a A A

Design And Implementation Of Customer Information Cleaning In CRM System

Posted on:2016-03-14Degree:MasterType:Thesis
Country:ChinaCandidate:W YangFull Text:PDF
GTID:2348330503492547Subject:Software engineering
Abstract/Summary:PDF Full Text Request
With the deepening of market economy development, competition in the market, industry competition, brands increasingly competitive, companies continuously improve their relationship with customers will contribute greatly to the improvement of the competitiveness of enterprises, business-friendly brigade were new customers, retain old customers and enhance customer profitability. In this context, CRM came into being. It is true, CRM can not be separated from the building of a unified information integration platform to support the need for uniform standards, the need for high-quality customer information. In addition, the construction of a unified data platform, means that multiple institutions will be organized to spread the theme of unity in accordance with the data loaded into data warehouse. This is also the process of building a data warehouse are experiencing a thorny problem. Therefore, through data cleaning technology on the business problem of data cleansing, data cleaning has been research focus in recent years in the field of data warehousing, its main task is to eliminate data inconsistencies and error message, remove duplicate and redundant information.The paper first describes the basic concepts of data quality, evaluation indicators and data quality issues such as the issue of classification, based on this leads to the basic concepts of data cleaning, data cleansing metadata design, data cleansing data cleansing process and strategy, given data cleansing, the design of the overall framework. And for data quality issues, given the overall process of data cleansing, data cleansing, made three steps: data preprocessing, attribute the duplication of data cleansing and cleaning. A data cleansing combination of manual and automated of the mixed strategy. Among them, the dirty data detection, the majority of dirty data cleansing, washing through the definition of rules, ETL tools with analytical data cleansing rules of self-cleaning; for the duplication of data cleansing, especially the Chinese customer information to repeat the data cleansing It is this paper, data cleansing face another important issue.This article was found through research and analysis, unit names and addresses of customers, as customers of important information, which constitute the rules of a certain model and relatively fixed, for example, by the administrative division name, unit number, industry and organizational forms, such as a few parts; address the basic unit includes the provincial, city, district, street and other information, so the knowledge base through the use of meta-data definition features of character based on the unit name and address information data split so that the post-split information more user-friendly Comparative analysis of accuracy and consistency; At the same time, taking into account the existence of units referred to with the full name and a name referred to the existence of a number of issues, this paper has been building systems experience and knowledge of customer information, customer unit includes the entire process and referred to the data information, the name of the customer data standardization and split a very good role.Customer information in the repetition of the cleaning process, the different attributes for the determination of the only customer is different, so different attributes have different weights. Therefore, similar to those in the cluster dealing with duplicate records, the records through the characteristics of different similarity fields were compared with the results of the comparison of the integration of the various fields of the right to record the value of aggregate value of the similarity of the values and record matching relevant comparison threshold to determine whether two records match. Duplication of data in the clustering algorithm, using an improved algorithm for priority queue is similar to repeat customer information to complete the cluster, normalized against the customer's problem, manually operated effectively to ensure the company's customer resource.
Keywords/Search Tags:data quality, data cleaning, ETL, similar duplicate records, feature characters, similarity matching
PDF Full Text Request
Related items