Font Size: a A A

The Research Of Real-time Processing Based On Big Data Stream

Posted on:2016-06-16Degree:MasterType:Thesis
Country:ChinaCandidate:F XuFull Text:PDF
GTID:2298330467461850Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
Recently, with the rapid development of the computer industry and the fast expansion ofthe computer applications fields, big data has become a common word throughout the ITindustry. A report from the American Internet Data Center stated that in current world thereare more than90percent of data is generated in recent years and the data on the Internet willbe at annual growth rate of50percent. Big data is everywhere, it does not only refer to theinformation posted on the Internet, it also includes information related to temperature,humidity, water PH value, vibration, position and even chemical changes in atmosphericcomposition that detected by different kinds of sensors deployed in the industrial equipment,meters and cars. This kind of data not only has a huge amount of information ranging fromGB to TB, event to PB level, but also processes different types, from simple log documents,pictures to complex video, geographic location information returned from satellites and so on.The traditional way of real-time processing of data streams coping with this kind of dataincurs many problems.This paper discussed the problems presented by traditional ways of real-time processingwhen coping with big data streams either under a centralized environment or a distributedenvironment.Firstly, this paper focused on the centralized environment. Under this circumstance,real-time processing over large data stream ensures short time response of the system mainlyby sacrificing the quality of service. Three main problems need to be solved. They aresynchronization problem, consistency problem and concurrency problem. This paperproposed an optimization strategy that can be a good deal to improve system performancewhile effectively alleviating the I/O concurrency conflicts. Details are as follows:(1)In order to solve the problems proposed above, we designed a circulation data bufferto store data and adapt the multi-threads mechanism and message deliver method to ensurethe synchronization and consistency requirements.(2)Proposed a new method to deal with the intermediate data by memory temporarilyallocating and using the new classification of information generated in accordance withpre-defined classification criteria stored in dynamic memory, when the number of the memoryblock exceed the pre-set upmost or the stay allowed time exceed its criteria, the method writeall the block to the hard disk, this could effectively reduce the number of read-write headreallocating and ease I/O concurrency conflicts.Under the distributed environment, big data streams usually exist in the form of workflows. In order to ensure its short work response, we analyze and solved the problems byusing the most popular Hadoop distributed system platform, and then proposed a methodbased on real-time scheduling workflow cluster topology. A large number of experimentalresults show that this method could effectively reduce the average completion time andwaiting time comparing to the traditional FIFO algorithm. Specific main work is as follows:(1)According to the computing capability of the taskTracker and data size, we couldevaluate the completing time of the workflows under map stage and then we could get a membership function and by using the topological structure of the cluster, we could get adistance membership function of the taskTracker in the cluster. By using the two functions,the overall task flow processing time and data transferring time could be evaluated. Thismethod could reduce the waiting time of the task and improve the resource utilizationeffectively. Meanwhile, a task priority queue was adopted to meet the needs of different typesof jobs.(2)Meanwhile, this algorithm prioritize the work flow to meet the needs of differenttypes of jobs.
Keywords/Search Tags:MapReduce, priority level, topology, resource utilization, shared memorymechanism
PDF Full Text Request
Related items