Font Size: a A A

Research On Data Imputation Methods Oriented Specific Domains

Posted on:2022-11-29Degree:MasterType:Thesis
Country:ChinaCandidate:X Y ShiFull Text:PDF
GTID:2518306773481234Subject:Trade Economy
Abstract/Summary:PDF Full Text Request
With the rapid development of data-related theories and technologies,massive amounts of data have been generated in various fields such as finance,medical care,and electric power.These data sources are widely distributed,variable in format,and large in scale.However,due to network transmission and manual review and verification,some attributes in the data sets are missing or even wrong,and data quality is difficult to guarantee.Data quality is an important foundation to support analysis,decision making,and visualization.How to improve data quality has become a hot issue in the field of data processing.In this paper,we investigate the data imputation problem.Many scholars have studied the problem of data imputation.They have proposed some fruitful solutions.However,these methods are mostly general domain-oriented data imputation methods,which do not consider the specific characteristics of the domain,and the imputation efficiency is not satisfactory when filling domain-specific data.Domain-specific data often have specific data characteristics.In this paper,a domain-oriented data filling method is proposed.We use different strategies to impute incomplete datasets according to the characteristics of the missing datasets,taking the financial domain as an example.The main works in this paper are the following.(1)Aiming at the characteristics of dependent data missing,a reconstruction multi-dependent missing data filling algorithm is proposed.First,the improved clustering algorithm and the cluster-based attribute reduction algorithm are used to obtain the associated dataset of missing data.Secondly,a moving window filling algorithm is proposed to find candidate filling data similar to the missing data from the associated dataset.Then,we use the cache index strategy to reduce the complexity of the algorithm.Finally,we realize the filling of missing data,while improving the efficiency and accuracy of filling.(2)Aiming at the missing characteristics of extended datasets,a multi-feature extended missing data filling algorithm is proposed.Firstly,a functional dependency algorithm is used to construct query keywords so as to search for relevant web page text fragments in search engines.Secondly,a web page source credibility evaluation table is designed to screen appropriate web page text fragments,and candidate entities are extracted from the web page text fragments.Then,combined with the context features of the candidate entities,the features of the candidate entities are extracted.Finally,an improved evidence fusion algorithm is proposed to fuse multiple features of the candidate entities,and the candidate entity with the largest fusion feature value is used as the target entity to fill in the missing data.(3)The experimental verification and comparison are carried out on multiple real data sets and the current more advanced filling algorithms.The results show that the two missing data filling algorithms proposed in this paper can effectively impute in the missing data sets.
Keywords/Search Tags:Missing Data, Moving Window Matching, Feature Extraction, D-S Evidence Theory, Data Imputation
PDF Full Text Request
Related items