| The rapid development of modern data acquisition technology makes data continues to grow in all areas, faced with massive data conventional computer powerless. Under this background cloud computing borned quietly, Hadoop is an open source software under the Apache Foundation, which implements the infrastructure, including distributed file systems and distributed computing framework for cloud computing software platform, And on which integrates including databases, cloud computing management, data warehousing and a series of platforms. It has become standard platform in industry and academia for cloud computing applications and research. HBase is the database in Apache Hadoop, which can provide random, real-time read / write access to scale the data.With the improvement of people’s living standards, logistics vehicles become more and more, GPS(Global Posotioning System) vehicle data generated more and more, we hope that time is stored vehicle information getting longer and longer, the face of mass GPS data the traditional handling, storage has been inadequate, and cloud computing, the emergence of cloud storage technology to handle massive data storage provides a good solution.First, hadoop basics is introduced and the core subprojects of Hadoop Hadoop Distributed File System and Distributed processing programming model is given,and HBase is introduced simply. Second, I build the Needed distributed environment,which include the building of Hadoop, HBase and Sqoop. Then, By the Hadoop distributed programming model I process GPS information, and I Implement four instances of Hadoop-based application and through those instances I have a Better understanding of the Hadoop programming model. Finally, The basic principle of HBase and rowkey design are introduced, and HBase write performance is analyzed, and the write performance was tuned to improve write performance in HBase, and design and implement the storage of GPS data based on HBase,which are Verified by experiments. |