Font Size: a A A

Research And Application Of Data Cleaning In The Construction Of POI Data Warehouse

Posted on:2017-10-04Degree:MasterType:Thesis
Country:ChinaCandidate:J L ZhengFull Text:PDF
GTID:2348330509955021Subject:Information and Communication Engineering
Abstract/Summary:PDF Full Text Request
Mobile Internet has subverted the people's way of life. In order to attract customers' consumption, the merchants introduce various offers with the Internet +, but the discount information scattered throughout the Internet, users may need to install several software or visit different websites to enjoy the different business deals neighborhood, and the users' reach-degree is low. To acquire and gather POI information, and build a data warehouse, it needs to import large amounts of data from various and heterogeneous original data sources, which exist various quality problems, such that makes the error analysis results in the decision support system of the front end of the application, and impacts on the information service quality. Data cleaning is a certain important way to improve data quality.This thesis first introduced the concepts, principles, meaning and the research status at home and abroad about data cleaning, and focused on the comparison and analysis on the characteristics between string matching algorithm and similar and duplicate record detection algorithms. The following, it carried out the overall design of the data warehouse based on the features of POI merchants' information gathered from the public comment, the Group 800 and QQ food. And then the thesis did data cleaning for the specific quality of POI merchants' information, including business name, business address and business phone numbers and so on. In which it should establish standard administrative divisions dictionary database so that to complete the standardization process on the business address information; And gradually built streets and landmarks thesaurus, so that do standard thesaurus reference for the future more data cleaning about business address POI merchants' information. Subsequently, it verified the data cleaning results according to the standardization processed business address of POI information with Arc GIS. Namely obtaining latitude and longitude of business address before and after the data cleaning with the use of the address resolution interface Baidu API provided, and showing the visually effect on before and after data cleaning by Arc Map fit latitude and longitude. Finally, it compiled the two-dimensional into one-dimensional string of latitude and longitude with Geo Hash coding, convenient to query and push the business information based on user's location service, and then loaded the cleaned data into the data warehouse.At the end of this thesis is a summary of the work done, and a prospecting development and application in-depth about data warehouse of POI information.
Keywords/Search Tags:Data Cleaning, Record Match, Duplicate Record, Data Warehouse, Geohash
PDF Full Text Request
Related items