Font Size: a A A

Research On Semantic-based Trajectory Flow Data Cleaning Method

Posted on:2021-04-06Degree:MasterType:Thesis
Country:ChinaCandidate:Y W JiangFull Text:PDF
GTID:2428330623965412Subject:Computer system architecture
Abstract/Summary:PDF Full Text Request
In recent years,with the development of the Internet of Things technology,various types of sensors and portable mobile devices have penetrated people's daily lives,recorded every bit of people's lives and transmitted information in the form of streaming data.The significance of studying these streaming data is not only to summarize the characteristic information of different objects,but more importantly,to dig out potential knowledge that may change people's daily life.Therefore,the mining research on streaming data has always attracted extensive attention of researchers.The primary task of streaming data mining is to clean the streaming data in order to improve the quality of the data and avoid the bias of the results of streaming data mining caused by noisy data.The existing stream data cleaning method has the following two problems: one is to effectively improve the quality of the data but ignore the volume of the data,increase the storage pressure and cause runtime memory overflow;the second is to ignore the semantics of the data during the cleaning process Information,which affects the quality of the spatiotemporal properties of the data.In view of the above problems,the research direction of this paper is determined as: the improvement of stream data quality,the reduction of stream data volume and the embodiment of stream data semantic information.This paper studies and improves the method suitable for stream data cleaning.The sliding window model is used to obtain a subset of stream data.The improved method of extracting stay points and movement is used to process the acquired subset,so as to achieve the purpose of improving data quality and compressing data..At the same time,this article uses semantic information as one of the conditions for data cleaning,which helps to improve the quality of data cleaning and compress the volume of data to reduce the cost of data storage.The experimental data used in this article is the spatiotemporal data of shopping mall customers from real scenes.First,use the sliding window model to obtain a subset of data from large-scale streaming data;then use an improved stay point detection method,that is,obtain the semantic stay point first,and mesh the area determined by the condition and determined as the semantic stay point.Form smaller candidate candidate areas;then perform noise detection and elimination on each candidate candidate area to obtain the data after cleaning the candidate candidate areas;finally,summarize the data of all candidate candidate areas to obtain the final cleaning data set.For the quality verification of the data after cleaning,two methods are used in this paper.The first is to compare the results with another data cleaning method that uses the same data for research.The second is to use a clustering algorithm to obtain the clustering shape of the data for verification.Experimental results show the superiority of this method in terms of improving data quality and the effectiveness of data volume reduction.An application scenario was added during the clustering analysis.Due to the limited memory conditions of the stand-alone device,the traditional density-based clustering method cannot achieve clustering.Therefore,this paper uses an improved semantic grid-based clustering algorithm SGSCAN to cluster each staying point candidate area,and then maps the clustering results to the original research area for summary analysis to find hot spots in shopping malls.
Keywords/Search Tags:stream data cleaning, semantics, data quality, cluster analysis, hotspot analysis
PDF Full Text Request
Related items