Research On Data Quality And Cleaning Evaluation Technology In Networking Audit

Posted on:2017-07-06

Degree:Master

Type:Thesis

Country:China

Candidate:M Y Zhou

Full Text:PDF

GTID:2348330518470809

Subject:Computer Science and Technology

Abstract/Summary:

PDF Full Text Request

Audit field has been from the traditional manual audit to computer audit,resulting in more and more data,but these data does not produce the information,so often the data is a lot but knowledge is poor.Data quality is the key to determine the quality of the data,only good data quality can help people make the right decisions,come to the credibility of information.To evaluate the data quality and complete the data cleaning,which is the common method to improve the data quality.This paper mainly studies the method of data quality assessment and data cleaning method in the field of audit.In this paper,we studied the principle of data cleaning and the method of cleaning different kinds of dirty data.The audit data has its own characteristics:the abnormal data in the data may be a reflection of the abnormal phenomena of the thing.Data quality assessment,the more effective abnormal data,the more the data quality is higher.In the network audit can be assessed by the data of the potential online search.Based on the characteristics of audit data,this paper presents a method for evaluating the potential of the audit in the field of audit.In data cleaning,the common field matching algorithm:Levenshtein Smith distance,Waterman distance,Hamming distance is presented and a detailed analysis of the algorithm.Based on the idea of "sorting and merging",the basic nearest neighbor sorting algorithm,the multi neighbor ranking algorithm,the priority queue algorithm are studied,and the algorithm of local sensitive hash is proposed.Compared to the algorithm based on the sorting and merging algorithm.And the "sorting and merging" algorithm is sensitive to the key words,the different ranking methods may generate different clustering,and the algorithm based on local sensitive hash is not sensitive to the order of key words.Because of the relatively small number of duplicate records,the "sorting and merging" algorithm has a lot of different records,and the algorithm based on local sensitive hash has reduced the number of similar repeated records,and reduced the number of times of similar repeated records.The experimental results show that the duplicate records of local sensitive hash detection algorithm in recording the comparing times better than the traditional algorithm based on the comparison of the number of records of an order of magnitude less than the traditional algorithm,but in precision and recall is slightly lower than the traditional algorithm.

Keywords/Search Tags:

Audit, Data quality, Data cleaning

PDF Full Text Request

Related items

1	Research On Data Cleaning Based On Science And Technology Innovation Big Data Public Platform
2	Research On Data Cleaning Technology With The Design And Implementation Of Data Cleaning Framework
3	Design And Implementation Of Quality Audit Subsystem Of Data Management Tax Comprehensive Application Platform
4	Data Quality Analysis And Optimization In Public Security Intelligence Based On ETL
5	Research Of Data Cleaning Method Based On Data Warehouse
6	Research And Application Of Data Cleaning In Guizhou Local Tax Projects
7	Study On Data Cleaning Based On XML And Its Application
8	The Research And Application Of Duplicated Records And Incomplete Data's Cleaning Approach
9	Data Bank Data Warehouse Build Process Of Cleaning And VIP Clients Of The Excavation
10	Heterogeneous Data Sources Integration In Research And Application Of The Cleaning Strategy,