Font Size: a A A

Designed Application Of Data-Analysis Based On Hadoop Platform

Posted on:2012-07-28Degree:MasterType:Thesis
Country:ChinaCandidate:W JiangFull Text:PDF
GTID:2178330335960779Subject:Control theory and control engineering
Abstract/Summary:PDF Full Text Request
To face of huge amounts of data on the Internet, a single host has been unable to meet the requirenments of the storage and computing. Use of distributed storage and distributed computing to analyze these data, and uncover its intrinsic value has become an inevitable trend. Hadoop is the most popular distributed strorage and computing framework in this area.On many large websites, Hadoop have already been used. In These applications, the common one is log analysis. Meanwhile some distributed implement of graph theory are applied. The amount of service platform logs, and the entire community of the Internet have a great amount of data, these datas were written one time, read multiple times, and this fits well with the distributed computing application scenarios.First, this paper analyzed the Hadoop storage system, the framework of distributed computing and the storage and computation characteristics of Hadoop. Based on the Hadoop platform, this paper conduct a detailed design of the relationship of URL and user click frequency, and query correlation in search engine. Meanwhile the distributed algorithm of single-source shortest path and web page quality evaluation are realized. Based on the above data analysis, program design and implementation experience and the Hadoop system characteristics, Map/Reduce distributed application performance optimization, the second sort, and the data join scheme used in RDBMS was designed and implemented on the Hadoop platform.Then, experimental test environment was built. On this platform, log statistics programs, query terms correlation analysis algorithms, and distributed single-source shortest path algorithm were realized and analysed. The method of design and optimization in distributed algorithm on this platform was proposed and tested.The last, gave a summary of the extant problems of Hadoop distributed storage and computing framework, proposed research point which could be further improved.
Keywords/Search Tags:Hadoop, distributed storage, distributed computing, data-analysis
PDF Full Text Request
Related items