Font Size: a A A

Research On Data Cleaning Methodology Of Evaluation System For Field Military Exercises

Posted on:2012-04-08Degree:MasterType:Thesis
Country:ChinaCandidate:Z C YeFull Text:PDF
GTID:2218330371962556Subject:Detection Technology and Automation
Abstract/Summary:PDF Full Text Request
With the informationization processing of the military exercises, it is difficult for evaluation system for field military exercises to provide objective evaluation and impartial judgement, which needs on basis of real-time objective precise data. Several problems including complete duplicate data, error data of evaluation information and missing data are the main factors which deteriorate the data quality of the evaluation system and cause less objective evaluation, less impartial judgement, inaccuracy situation determine, or even depress the system credibility. In order to improve data quality and facilitate military exercises, this paper focuses on the solutions to these problems.In order to cope with the problem that the complexity of the classical duplicate data detection algorithm is high and can not meet the real-time demand of the evaluation system. A complete duplicate data detection algorithm based on characteristic value is proposed, in which the hashing plus matching method is adopted. Firstly, the data of interests is combined into block according to the extraction rules, and the characteristic value of the block is calculated with the characteristic value generating function, and finally, the records with the same characteristic value are matched. Meanwhile, several key technologies to the algorithm are analyzed, including the deterministic field extraction rules, the characteristic value generating function and the methods dealing with the conflicts. Experiment results show that compared with the classical detection algorithm, the proposed algorithm improves the real-time process ability substantially, and although it increases the space complexity and reduces the recall rate slightly, it greatly improves the detection efficiency of the complete duplicate data.An outliers-based evaluation information error data detection algorithm and a heuristic error correcting algorithm based on the keyboard distance are proposed. In order to solve the problem of the rule-based error data detection algorithms, which can not meet the real-time demand of the detection, the rules are distinguished into one-value rules and multi-value rules, and combined with the outliers, an outliers-based evaluation information error data detection algorithm is proposed. And based on the analysis of the generation of the evaluation information error data, a heuristic error correcting algorithm based on the keyboard distance is proposed to eliminate the useful information lost.The K-NN (K-Nearest Neighbor) algorithm for the evaluation information missing data estimation is modified. Firstly, the real-time processing ability of the algorithm is enhanced by setting limits on the search space, which reduces the influence of the data size. And then, considering that the estimation result of the algorithm is incorrect in some cases, the expert knowledge and experience are turned into fuzzy rules to combine with the K-NN algorithm, so that the incorrect estimation results are eliminated. Theory and experiment results show that the evaluation information missing data detection algorithm is real-time and professional, and error rate is reduced.The proposed algorithms being applied to the evaluation system for field military exercises, and the test results show that the proposed methods can improve the real-time processing ability and the data quality, prepareing for objective evaluation and impartial judgement.
Keywords/Search Tags:Data Cleaning, Evaluation System for Field Military Exercises, Complete Duplicate Data, Error Data, Missing Data, Characteristic Value, Fuzzy Rules
PDF Full Text Request
Related items