Font Size: a A A

Research On Several Data Mining And Cleaning Algorithms On The Massive RFID Data In Internet Of Things

Posted on:2013-05-18Degree:MasterType:Thesis
Country:ChinaCandidate:Z P LuFull Text:PDF
GTID:2268330395490812Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
In recent years, with the rapid development of RFID(Radio Frequency Identification)technology and the decreasing of tag cost, RFID technology has been used in many areas, such as supply chain management, object tracking, medical treatment, logistics and so on. Big retailers like Wal-mart, Target and Albertsons have already been deploying RFID systems in their logistic centers and warehouses, also requiring the suppliers to paste RFID tags on small packs of the merchandises. One of the biggest problems of RFID is that it has to deal with inundant amount of data. Several terabytes of data will be generated even with proper RFID development because every object with a tag on will send data the the reader constantly. Meanwhile, with the interference of the environment and the instability of radio frequency signal, the received RFID data are always unreliable and along with noises. Data like this can’t be provided to the high level enterprises for an advanced application. These reasons have highly restricted the popularization and development of RFID technology. Therefore, it has become a urgent issue for how to effectively and efficiently clean the RFID data.The traditional data cleaning technology can’t fully meet the needs of cleaning RFID data streams. The existing technologies mainly consider the window smoothing method and space-time relevance strategy based on the historical reading on the data level. These methods may perform well in the application scenarios with single reader but they are not suitable for data cleaning in multi-logical areas. On the basis of learning RFID data cleaning technology both home and abroad, this paper mainly researched into some areas related with false positive reading, false negative reading and duplicate reading. The main innovations of this paper are as follows.1) For the traditional data cleaning algorithms mainly concern about the data mis-reading errors, while ignoring the false positive reading errors, HTB algorithm is proposed in this paper. This algorithm can clean the data with frequency of the count lower than the threshold in a certain period of time by setting a noise threshold. Meanwhile, it can effectively reduce the complexity of counting operation and raise the cleaning efficiency by using hash table structures to store the count data.2) For the traditional data cleaning algorithms mainly fill up the data based on the space-time relevance in the data level, they are not suitable for RFID application scenarios with track information based on multi-logical areas. This paper proposed a track data filling algorithm based on movement recency by studying the characteristics of RFID track data. This algorithms maintain a track event tree according to the historical data, to predict the future data and guide the data cleaning. Also it considers the effect on the movement rules from time factor and brings in the ageing factor for maintaining the track event tree, which improved the predict accuracy of the tree and raise the veracity of the filling algorithm.3) The original data set collected by the readers is very enormous and with a lot of redundancies, it is not suitable to the following cleaning process. We proposed a redundancy deleting algorithm in this paper. By setting a time tolerance threshold, it cleans the data redundancy and reduces the data, avoiding the redundancy refill in the following data cleaning processes. At the same time, consider the different restricts existing in different RFID applications, we proposed a data cleaning algorithm based on restricts which can guide the data cleaning with the self-learned restricts and user-appointed restricts, improve the accuracy of cleaning.
Keywords/Search Tags:Radio Frequency Identification(RFID), data mining, data stream cleaning, datafilling, constraint strategy
PDF Full Text Request
Related items