Research On Semantic-based Trajectory Flow Data Cleaning Method

Posted on:2021-04-06

Degree:Master

Type:Thesis

Country:China

Candidate:Y W Jiang

Full Text:PDF

GTID:2428330623965412

Subject:Computer system architecture

Abstract/Summary:

PDF Full Text Request

In recent years,with the development of the Internet of Things technology,various types of sensors and portable mobile devices have penetrated people's daily lives,recorded every bit of people's lives and transmitted information in the form of streaming data.The significance of studying these streaming data is not only to summarize the characteristic information of different objects,but more importantly,to dig out potential knowledge that may change people's daily life.Therefore,the mining research on streaming data has always attracted extensive attention of researchers.The primary task of streaming data mining is to clean the streaming data in order to improve the quality of the data and avoid the bias of the results of streaming data mining caused by noisy data.The existing stream data cleaning method has the following two problems: one is to effectively improve the quality of the data but ignore the volume of the data,increase the storage pressure and cause runtime memory overflow;the second is to ignore the semantics of the data during the cleaning process Information,which affects the quality of the spatiotemporal properties of the data.In view of the above problems,the research direction of this paper is determined as: the improvement of stream data quality,the reduction of stream data volume and the embodiment of stream data semantic information.This paper studies and improves the method suitable for stream data cleaning.The sliding window model is used to obtain a subset of stream data.The improved method of extracting stay points and movement is used to process the acquired subset,so as to achieve the purpose of improving data quality and compressing data..At the same time,this article uses semantic information as one of the conditions for data cleaning,which helps to improve the quality of data cleaning and compress the volume of data to reduce the cost of data storage.The experimental data used in this article is the spatiotemporal data of shopping mall customers from real scenes.First,use the sliding window model to obtain a subset of data from large-scale streaming data;then use an improved stay point detection method,that is,obtain the semantic stay point first,and mesh the area determined by the condition and determined as the semantic stay point.Form smaller candidate candidate areas;then perform noise detection and elimination on each candidate candidate area to obtain the data after cleaning the candidate candidate areas;finally,summarize the data of all candidate candidate areas to obtain the final cleaning data set.For the quality verification of the data after cleaning,two methods are used in this paper.The first is to compare the results with another data cleaning method that uses the same data for research.The second is to use a clustering algorithm to obtain the clustering shape of the data for verification.Experimental results show the superiority of this method in terms of improving data quality and the effectiveness of data volume reduction.An application scenario was added during the clustering analysis.Due to the limited memory conditions of the stand-alone device,the traditional density-based clustering method cannot achieve clustering.Therefore,this paper uses an improved semantic grid-based clustering algorithm SGSCAN to cluster each staying point candidate area,and then maps the clustering results to the original research area for summary analysis to find hot spots in shopping malls.

Keywords/Search Tags:

stream data cleaning, semantics, data quality, cluster analysis, hotspot analysis

PDF Full Text Request

Related items

1	Data Stream Processing Algorithm Based On Cluster Analysis
2	The Design And Implementation Of Data Analysis System For Data Cleaning
3	Research On Data Cleaning Based On Science And Technology Innovation Big Data Public Platform
4	Research And Application Of Data Cleaning In Guizhou Local Tax Projects
5	Data Quality Analysis And Optimization In Public Security Intelligence Based On ETL
6	Research On Data Cleaning Technology With The Design And Implementation Of Data Cleaning Framework
7	Research On Uncertain Data Stream Database System
8	Clustering-Based Hotspot Analysis And Alarm Compression For Mobile Internet
9	Data Clean And Cluster Analysis For Path Data From RFID Agricultural Products Tracing System
10	Research On Key Technologies In Data Stream Analysis