Font Size: a A A

Research On Internet Of Things Oriented Cleaning And Storage For Unreliable RFID Data Set

Posted on:2014-12-05Degree:DoctorType:Dissertation
Country:ChinaCandidate:H FanFull Text:PDF
GTID:1268330422974311Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
Internet of Things (IoT) is proposed to achieve the organic combination of humansociety and the physical world, which can make the human cognize the world in a morerefined and dynamic way and realize the management and control to improve theinformation level as a whole. RFID technology is one of the most important informationtechnologies in the field of Internet of Things, which is widely used in the field oflogistics warehousing, supply chain management, asset management, personnelmonitoring, indoor positioning and tracking, and so on. RFID technology is anon-contact radio frequency identification technology. By scanning the RFID tags,readers can obtain the location and time information of the tags in real-time in order toachieve the tracking and positioning of the RFID tags and the corresponding itemswhile the corresponding data is usually expressed in the form of (tag_ID, loc, time).However, due to plenty of miss reading phenomenon (approximately30%RFID datahave been missed), there is serious unreliability and incompleteness of the original datacollected by the RFID readers. How to clean these unreliable data and store themefficiently is the key of RFID technology in the field of Internet of Things applications,but also the focus of this paper.Based on the in-depth analysis on unreliability, highly redundancy and massivefeatures of RFID data, this paper focuses on improving the precision and efficiency ofinformation inquiry in the RFID technology based Internet of Things applications aswell as reducing RFID data storage overhead, and presents the corresponding modelsand solutions for data cleansing at the physical layer, data filling at the logical layer aswell as storage of massive data and query optimization. The main contributions of thispaper are as follows:1. We propose an unreliable RFID data cleaning method based on a probabilitymodel for the motion of tags. At the physical layer, in the light of the problem of RFIDdata leakage caused by miss reading, we model the RFID data stream by Bernoullibinomial distribution and introduce a probability model of RFID tag motion state. Thenwe create a conversion relationship between the raw RFID data and motion stateinformation of tags (speed, direction and displacement) so that the missed data can befilled according to the motion state information of tags. Finally, a reverse filteringmechanism for a data sequence is proposed to further ensure that the motion stateinformation of tags can be captured. Experimental results show that the cleaning methodhas higher accuracy than the classic sliding window smoothing technique.2. We propose a Hidden Markov Model based RFID trajectory data cleaningmethod. At the logical layer, in the light of the incompleteness of trajectoryinformation in the indoor tracking and positioning system based on RFID technology, we first map the reading sequence of readers in the system to the observable statesequence of the Hidden Markov Model while we map the position sequencecorresponding to the tag to the hidden state sequence in the Hidden Markov Model, sothat the trajectory cleaning problem of the tags is transformed into a classic decodingproblem based on the Hidden Markov Model. Based on the classical decodingalgorithm-Viterbi algorithm, an efficient algorithm for path decoding is presented.Experimental results show that the proposed algorithm can efficiently and accurately fillthe missed trajectories, provide a guarantee for the accurate information query, and theaccuracy and processing performance of data cleaning have been greatly improved thantraditional methods.3. We propose a Bayesian inference based approach for unreliable RFID datacleaning. Miss reading can cause supply-chain companies to mistakenly respond to themarket demand and bring huge economic losses. In order to accurately obtain thereal-time receiving and shipping information of the items being tracked, this paper firstpresents the path code schema based path matching algorithm, which can efficientlyobtain the distribution information of the tags which have the same historical path withthe current tag. Thus, a path information based differentiated decision model isproposed to provide differentiated decision program for the missed tags with differenthistorical path information and make the cleaning results more accurate. Finally, thesliding time window model which can effectively save the computational overhead ofthe model is introduced and it uses the maximum entropy model to dynamically adjustthe size of the time window so that the efficiency and accuracy of the model canperform a better balance. Experimental results show that the proposed cleaning methodnot only can effectively improve the data cleansing accuracy of the supply chain field,but also have better scalability.4. For the massive RFID data storage and query optimization, we propose asplit-path schema-based RFID data storage model. More and more space and time areneeded to store and process such huge RFID data, and there is an increasing realizationthat the existing approaches cannot satisfy the requirement of RFID data management.First, on the basis of the path framework based storage solutions, a tree structure basedpath splitting approach is proposed to split the movement paths of products intelligentlyand automatically according to the requirement of users. Further, we present a split-pathschema based RFID data storage model. With a data separation mechanism, the massiveRFID data produced in supply chain manage systems can be clustered, stored andprocessed more efficiently. Finally, based on the proposed new storage model, wedesign the relational schema to store the path information and time information of tags,and some typical query templates and SQL statements are defined. Experimental resultsshow that compared with the path encoding schema-based storage model, the proposedstorage model can significantly improve the path-oriented query performance. Moreover, the storage overhead of our model is only about12%of that of the raw RFID data.
Keywords/Search Tags:Internet of Things, RFID Technology, Unreliable Data, DataCleaning, Data Storage, Bayesian inference, Split-Path Schema
PDF Full Text Request
Related items