Font Size: a A A

Data Cleaning And Integration Algorithms For Expert Information Collection

Posted on:2014-07-15Degree:MasterType:Thesis
Country:ChinaCandidate:J ZhouFull Text:PDF
GTID:2268330425972238Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
Due to expert information is tremendous, some problems are found in the process of collection. Common problems are high redundancy, low credibility of the information and inconsistent of its description method. It’s hard to ensure accuracy of the information results. Specification in data cleaning and the following integration are the key to decide the information is useful or not. So we focus on the method of normalizing and effective merging expert information in this thesis.Since experts’ category information is not standard, this thesis proposes a data standardization algorithm called Feature-based Data Standardization (FDS) after researched traditional data cleaning algorithm. FDS summarizes data characteristics of experts’ achievements from training set. It calculates the weight of each data item characteristics in order to identifies every data item and normalizes each data item as we wish. The simulation results and algorithm analysis show that this algorithm has an advantage over existing algorithm in terms of time consumption. It doesn’t affect the identification accuracy when the data is huge.Since the experts’ attribute values have high redundancy and low reliability, this thesis proposes an automatic information fusion algorithm called Granular Computing-based Information Fusion (GCIF). In order to improve fusion results, the algorithm calculates or reasonably set the weight of each result. It constructs knowledge granule graph with all information samples. Then it finds the path with maximum weight. All knowledge granules on the path form the results of information fusion. The simulation results and algorithm analysis show that the proposed algorithm can get good fusion results. It is better than other algorithm in merging accuracy when the data is huge and has different conflict.The quality of information has a great influence on the accuracy of experts’information fusion. So information should be standardized before fusion. Traditional data cleaning algorithm has high complexity generally. The FDS algorithm proposed in the thesis can standardize expert’ information and effectively improve the quality of information in smaller time consumption. The proposed GCIF algorithm can improve the integrity and accuracy of the information fusion results when the data is large. It is valuable to data mining, knowledge discovery and other related works.
Keywords/Search Tags:Information fusion, Data standardization, Knowledge graph, Maximum weighted path, Granular computing, Expert database
PDF Full Text Request
Related items