Font Size: a A A

Research On Data Quality Of Data Center Based On Business Rule

Posted on:2013-02-19Degree:MasterType:Thesis
Country:ChinaCandidate:H G CongFull Text:PDF
GTID:2218330374966023Subject:Petroleum engineering calculations
Abstract/Summary:PDF Full Text Request
In order to improve data quality, much domestic and international research has beenmade on factors affecting data quality and the methods of improving data quality. Thesestudies primarily focused on the data quality issues for data warehouse, and gave data qualitymetric indicators and the methods of calculation of the indicators. The following issues fordata quality are summarized: fisrtly, so far, a kind of systematic data quality assessmentindicators have not been formed, morever, a complete quality system is not built. Secondly,they have not formed an authoritative data quality reference model, current much studyfocuses on some single issues, finally, the definition of data quality is variable, whichrequires data quality model has the scalability to meet the demand for this change. Followingresearch has been made to solve these issues.First of all, a complete data quality assessment system is proposed and is built in thispaper, which defines seven categories of data quality elements, such as accuracy, consistencyand so on. and the fifteen dimension rules, such as non-null constraint, range constraint andso on. Furthermore, the data quality elements are used to describe data quality, and dataquality constraint rules reflect the specific business rules and domain knowledge. Secondly,some indicators' definations and algorithm are propsed, and data quality analysis evaluationarchitecture and processes are built. The entire architecture is divided into the data layer andapplication layer, data layer includes the instance layer, pattern layer, data quality layer andextension layer of the data quality, among these layers, the data quality layer is described asdata quality metamodel, and data quality extension layer provides an extension of the dataquality model, application layer includes data quality assessment layer and presentation layer.Secondly, for the problem of approximately duplicated records exits in the data center,the traditional "sort&merge" approach is used, an improved detection method based oncluster of inner code sequence value is proposed in the paper. In the string matchingalgorithm, sequence alignment algorithms in bio-informatics is used. Improved methodsraised the detection efficiency, and achieved good results in practical applications.Finally, with data quality testing and evaluation in Daqing Oilfield Downhole ServiceBranch Company data center as the background, the proposed data quality testing andevaluation system is designed and implemented, the system manages and maintains a varietyof business rules, and assesses a variety of data quality indicators. The system has beenrunning in Downhole Service Branch Company data centers, which plays an important role inimproving the data quality of the data center.
Keywords/Search Tags:data quality, data center, business rules, data quality assessment system, dataquality metamodel
PDF Full Text Request
Related items