Reseach On Technology Of Storage And Management Of IoT Data

Posted on:2018-10-07

Degree:Doctor

Type:Dissertation

Country:China

Candidate:X J Hao

Full Text:PDF

GTID:1318330512985622

Subject:Computer application technology

Abstract/Summary:

PDF Full Text Request

The rapid development of the Internet of Things(IoT)enables people to gather and analyze massive sensing data by deploying various kinds of sensors.With the rapid development of sensing technology and network technology,more and more sensors are being deployed for sensing and collecting data in various applications such as environment monitoring,location-based services,and human-centric pervasive-computing applications,IoT is becoming one of the most important sources of big data.How to efficiently store and manage the IoT big data has been a critical issue.However,it is not a trivial task to store and manage IoT big data.First,as for the "persistent storage",mass sensors can frequently generate and send new data to the data center,forming a data stream of GB/s.In large-scale distributed file systems,taking HDFS as an example,although they provide support for storage of big data,the increasing needs for real-time,high-performance online storage of large data cannot be met;In addition,with the rapid increasement of the amount of data,the size of metadata is also increasing dramatically.The traditional metadata architecture,metadata backup management,dynamic load balancing of metadata are more and more difficult to meet the needs of large data applications.Second,for the "data retrieval",the data retrieval system is needed to quickly retrieve the data stored in the persistent storage.However,the current retrieval systems,such as relational database and NoSQL database,cannot effectively meet the need of IoT big data.For example,NoSQL database designs the reading and writing methods,index structure,query execution,query optimization,recovery strategy based on the disk storage,but disadvantages of inherent high reading and writing latency in the disk can limit the big data storage,especially the improvement of analysis performance of big data.Third,for the "data analysis",data cubes are needed to achieve efficient statistical analysis.However,the traditional data cubes,such as HIVE,can only be used for analyzing certain data.When facing the uncertain data in the Internet of Things,it costs hours to complete statistical analysis and cannot meet the demands of practical applications.Finally,data storage,retrieval,analysis are running in the form of stream tasks in the data center.Considering that 40%of operation and maintenance costs are energy costs,how to schedule the task in an energy-saving way becomes the key to reduce cost in the data center.However,the current task scheduling platforms,taking Hadoop YARN as the representative,do not support any energy-saving task scheduling.In summary,there are limitations in the technologies of data storage and maintenance when facing big data of IoT.This paper presents a data storage and management system framework for big data of IoT(Sensor Storage).Sensor Storage is a platform for distributed data storage,retrieval,and analysis,and it mainly includes the following key technologies.(1)Distributed file system for massive small files.In this paper,a distributed storage system named SensorFS is built on the basis of HDFS.The system framework can quickly store,optimize the query of massive small files,and it can also provide high scalability and data security.We put forword Optimization mechanism and algorithm for massive small files,theoretically analyze and strat modeling the details of writing bottleneck of small files,design optimation stratigies for writing small files,and further propose optimation mechanism for reading massive small files in HDFS;(2)Space-efficient online index system for key-value data.In this study,RadixKV,an online index system for key-value data based on Radix Tree is established,which can provide fast retrieval service of data based on keyword for mass content in distributed file system according to different application requirements.This study analyzes the advantages and disadvantages of Radix Tree to analyze the online update performance of Radix Tree and design the online update strategy based fast Radix Tree with parallel sorting.A Radix Tree expression for spatial dost optimization-Radix Array is put forward,and the data structure of Radix Array is designed and the spatial dost of Radix Array is analyzed.(3)Data cube system for probabilistic data.This paper analyzes the "uncertainty"of big data in the Internet of Things,and designs the ProbabilisticCube,the data cube system for the probability data.It can provide the query service of fast aggregation for the probabilistic data to define the probability data model in the big data of the IoT;design the probability data cube based on the definition of the probability data model;design the aggregation operation for the high-performance probability data;design the concrete realization strategy of the data cube based on the estimation model of materialized cost;design the slice query and the dice query for the probability data.(4)Energy efficient task scheduling framework.A new distributed task scheduling framework Green Yarn based on Hadoop YARN extension is established.The new distributed task scheduling framework makes a reasonable scheduling for the streaming tasks of the IoT.Under the premise of no loss of performance,combined with DVFS,the reasonable match between the task and the NM can be achieved;we design energy efficiency model based on the task,and design task scheduling algorithms for offline management tasks and online tasks.Through the systemic research of this paper,it is expected to establish a new storage framework for big data of IoT.Through the design of file system,and the design of big data retrieval and analysis and optimization,the innovative design can be put forward to solve the basic problems.The research of this paper alleviates the pressure of storage and management for the big data of the IoT,and it can further realize the prototype system,provide support for the further verification,experimentation and application of the big data for efficient storage and management,and provide new thinkings for management theory of big data and systematization method.

Keywords/Search Tags:

Big data, Distributed file system, Data retrivel, Cube, Energy efficient tasks scheduling

PDF Full Text Request

Related items

1	Research Of Energy-Efficient Scheduling For Parallel Real-Time Tasks On Multicore Systems
2	Research On Energy-efficient Scheduling With Reliability Constraint For Embedded Systems
3	Research And Implementation Of Distributed Cube Distributed Storage And Construction Algorithm
4	Optimal Energy--efficient And Balancing Scheduling Algorithms For Tasks Considering Completion Time In Cloud Data Center
5	Multidimensional Data Model For Mining And Analysis Based On Multiple Structure Data Cube
6	Energy-efficient Online Scheduling Algorithm With The Life Cycle Consideration Of Virtual Machines In Cloud Data Centers
7	Research On Distributed Query Of Quotient Cube Based On Spark
8	Research Of Minimizing Energy Consumption Scheduling Algorithm For Parallel Real-Time Tasks In Multicore Systems
9	Research On The Efficient Materialization And Fast Query Of Condensed Data Cube
10	Scheduling Of Distributed Collaborative Tasks And Adaptive Forwarding On MANET