Font Size: a A A

Web Management System Based On Hadoop

Posted on:2016-07-17Degree:MasterType:Thesis
Country:ChinaCandidate:F JinFull Text:PDF
GTID:2298330467991884Subject:Electronics and Communications Engineering
Abstract/Summary:PDF Full Text Request
The hardware of computer and the software system have made quite rapid progress since the birth of first computer in1940s. The capacity of Computers’ Hard disk for storing data was only5M in1956and the price was very expensive.However the capacity is TB-level in2014and the price is Acceptable to Ordinary people now. The unusual huge development of computer hardware had made large contribution to the development of the whole computer industry and many other industries related such as financial industry. Information technology has entered a big-data era.How do the companies in different industries deal with so much PB-level even EB-level data? The answer is to build an enterprise-class big data processing platform. Though the big data processing is not that easy for ordinary programmer.This article mainly discusses how to build a platform to analys and process large data for general managers in common firms.Hadoop has attracted a lot of people’s attention in academia since2005when it came into being. As an open source technology of large data processing platform,hadoop’s birth is the result of the inspiration from the three famous technology articles published by Google corporation of United States. It is powerful, designed specifically for big data mining,and it has good stability and scalability.What’s more attractive to people is the fact that the requirements for computers hardware condition is not demanding.A common computer with regular hardware can run the hadoop system platform smoothly.The key techniques of Hadoop is the HDFS file system and Map&&Reduce mathematical programming model.HDFS mainly store the large data file and it’s the data source of the Map&&Reduce program. And the Map&&Reduce is the mathematical model for data processing and analysis.We need to implement the two important function:Map and Reduce.In the process of data analysis,Hadoop system goes through the hdfs file system to get the orginal data,and then it starts to run the Map&&Reduce programs.Finally we could get the result information we want.Howeve it is difficult and unpractical for people who is not a professional programmer to write code.In order to develop a system available for common people,I used hive which is the data warehouse based on Hadoop technology to build the web management system.We can query the large data file using the HQL statements of HIVE. Hadoop cluster usually runs on the Linux operating system. Linux is not easy for general users since most of us use windows series of Microsoft. As a result, I develop a management system based on web technology. Using this system we could just open the browser, enter the website address, after a bit of waiting then we could see the result we need on the screen. The communication between web server host and the host of Hadoop cluster is very important in this article.We know by above, there is two key technology to develop a web management system based on Hadoop:Hadoop platform and website development. The Hadoop incharges the storage of large data file and data process, with website providing the webpage from which we could enter the keyword and view the results. We enter the keywords through the webpage,and the web server gets the information which we enter. The server sends the data to the host of hadoop cluster through TCP protocol. The client program running on the host of hadoop cluster will analysis the data and generate the appropriate HQL statement after receiving the data from web server. Then Hive will execute the HQL statement query and the results will be sent back to the web server. We could view the result from the webpages at the end. This is the whole process which this article mainly discussed about.
Keywords/Search Tags:big data, hadoop, hive, web
PDF Full Text Request
Related items