Font Size: a A A

Research On Personal Information Fusion System Based On Distributed File Storage

Posted on:2011-09-12Degree:MasterType:Thesis
Country:ChinaCandidate:J HeFull Text:PDF
GTID:2178360308960946Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
Following competition globalization, the level of Enterprise Informationalization has been one of key factors which influence the Development of Enterprises. But, the features of many companies and government departments' the database application systems are distributed, independent and heterogeneous systems, formed information silos in their respective local area network. This because the development of these information systems is phased completed. In order to share the valuable information, the data between various information systems need to be exchanged and integrated which makes data integration an important research topic in the field of database applications. In the context of the massive amounts of data, which is growing at a geometric multiples speed, the requirement of data management and data processing has beyond the capability of traditional file system located on a single computer. Cloud computing, as a service-oriented computing model, is a good adaption of data integration needs.In this paper, by learning and drawing on some excellent distributed file system, I Proposed a distributed file storage model based on container to manage the large data sets which composed of huge number of small files. Making the container the basic unit of replication and positioning, can reduce the meta-data burring system running, increase system scalability; the other hand, storing data clustered can optimize the disk structure.Secondly, facing the gap between the calculation ability required by the data cleaning process and the computing power offered by the single computer, I use Map/Reduce to distribute the calculation in the cluster computing environment.Finally, according to the above two factors I design the personal information fusion system dealing with the integration of multi-source and heterogeneous data. And give the detailed description of the functions and working principle of each module.
Keywords/Search Tags:Distributed File Storage, Data Integration, Parallel Computing, Map/Reduce
PDF Full Text Request
Related items