Font Size: a A A

Data Resource Search And Preprocessing System For Facility Location

Posted on:2021-04-28Degree:MasterType:Thesis
Country:ChinaCandidate:D YangFull Text:PDF
GTID:2518306050968039Subject:Master of Engineering
Abstract/Summary:PDF Full Text Request
Facility placement has been receiving considerable research attention due to the proliferation of GPS equipped mobile devices.Massive user location check-in data has a positive significance for the development of facility placement research.Traditional target data resources,such as geographic information data,user trajectory data,are usually collected by human or target object with GPS equipment.However,these data collection forms often expose a series of problems such as high cost of data acquisition,heavy workload,low efficiency and effectiveness,etc.Aiming at the existing problems in the acquisition of data resources and the growing characteristics of data resources on the Internet.In this thesis,we design and develop a data resource search and preprocessing system for facility placement,which can not only provide real-time data resources for facility placement,but also support the visualization of location check-in data.The system is divided into two parts: data resource searching and data preprocessing.In the data resource search section.By studying the types of data resources needed for facility placement and the characteristics of data types of major network platforms,we finally selected Weibo,which has a large number of users,good real-time performance and great influence,as the data collection platform.At the same time,in view of the limitations of Sina API on obtaining data resources,we designed and developed a focused web spider based on Weibo.The spider first cyclically obtains detailed locations in the location area according to the location entered by the user,and then obtains the detailed information of the user in each location area in order to obtain all the user data resources in the area.At the same time,we successfully bypassed the anti-crawling restrictions of the website and collected user signin data resources by constructing Cookies pool,IP proxy pool and simulated login.In order to meet the data collection performance requirements of this system,we further distributed the spider to break through the limitation of single-machine data collection performance.In the data preprocessing part,for the user check-in data,we integrated the user trajectory by defining a mathematical model,and performed a map matching calculation on the check-in data.For user blog,we first process the text data through a series of means such as data cleaning,expanding thesaurus,Chinese word segmentation,and removal of stop words.Then we use improved K-Means algorithm to perform cluster analysis on user blog data,Finally,we use statistical analysis to study the differences between different categories,and use appropriate tags to describe each category group,and successfully build the user profile.In addition to the data resource search and preprocessing functions,we also integrated the trajectory data visualization and facility placement function in the system.In the data visualization section,the system provides users with three data display solutions: Heatmap,Road,Trail.In the facility placement section,this system integrates three general solutions of Place-One,Place-k,and Incremental-One,which improves the users experience in using the system.After the system test,the system can obtain user data resources and process data effectively in real time.At the same time,compared with the traditional data acquisition method,the system has the characteristics of high real-time data acquisition,large data volume,low cost of system development,simple operation and rich functions,etc.
Keywords/Search Tags:Facility placement, Web spider, Map matching, User profile, Visualization
PDF Full Text Request
Related items