Font Size: a A A

Research On MultiSource Unstructured Data Integration Based On User Feedback

Posted on:2016-01-23Degree:MasterType:Thesis
Country:ChinaCandidate:Y P ShenFull Text:PDF
GTID:2308330470967743Subject:Computer system architecture
Abstract/Summary:PDF Full Text Request
In recent years, with the rapid development of information technology, the volume of data is increasing and data types are diverse. In this context, China Knowledge Center for Engineering and Sciences Technology Program started in 2012, aimed to get through massive data. There are two challenges to integrate multiple engineering data sources:As the limitation of network bandwidth, normal integration method is inefficient. There is slot of unstructured data in Knowledge Center, it’s not so well to use relational way to process unstructured data. Motivated by this requirement, considered the data integration, unstructured data processing method and user feedback mechanism, an approach of unstructured data integration was proposed. The approach supports multiple data sources which serves pictures or texts data service to be integrated. Some influencing factors were used to optimizing querying in the integration system, like user feedback factor, query history factor and the similarity between queries and sampling of data source.Firstly, this paper introduces the work has been made about data integration, unstructured data processing and user feedback. Secondly, this paper introduces the framework of the integration system. The main parts of the system were data source registering, data source selection, result merging and user feedback, highlighting the feedback part which contributes a lot. Then this paper shows how to use three influencing factors to select data source and merge result. Finally some experiments were done with MIRFlickr data set to show how the work effects the result of unstructured data integration.
Keywords/Search Tags:data integration, unstructured data, user feedback, data source selection
PDF Full Text Request
Related items