Font Size: a A A

Design And Implementation Of Data Fusion Algorithm For Multi-Source IP Geolocation Databases

Posted on:2020-08-24Degree:MasterType:Thesis
Country:ChinaCandidate:B XieFull Text:PDF
GTID:2428330572473678Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
IP geolocation database is a database that records the IP addresses and their geographic locations.IP address is a logical address of the network or host in the Internet,and there is no strong connection between IP address and their geographic locations,which makes it difficult to construct and maintain an accurate IP geolocation database.At present,there are many commercial or free IP geolocation databases,but most of them are facing the problems of inaccurate and missing geolocations.Moreover,there is no effective evaluation method of IP geolocation database.Therefore,it is of great theoretical and practical significance to study an effective model to evaluate the accuracy of IP geolocation databases,and build an accuratedatabase using data fusion method.Most of the existing IP geolocation database evaluation models are based on sampling method,however,since there is no authoritative IP geolocation database in the industry,and IP's geographic location changes frequently,the verification dataset of this method has the problems of high construction cost,small quantity and large deviation.In this thesis,a comprehensive IP geolocation database evaluation and fusion model is proposed,based on which an evaluation and fusion system is implemented using big data platform.This model does not rely on specific data sets,which is able to systematically quantify the accuracies of the selected IP geolocation databases.Then fusion method is used to integrate the inconsistent locations into one location,with thich a fusion IP geolocation database can be built.The main research work is as follows.Firstly,multiple commonly used IP geolocation databases are selected,and the delays of all IP addresses are measured.Then,based on the data correlation of different databases and delay similarity calculation method,the accuracy of each IP geolocation database is evaluated.After that,the fusion result is determined by delay similarity filtering and weight voting method.Finally,a more accurate fusion IP geolocation database is constructed.This system has the advantages of supporting large-scale storage,high processing efficiency and good scalability.This paper selected five commonly used IP geolocation database in China,and used the system to evaluate and fuse the geolocation information,which are assigned to 340 million IP addresses in mainland China.In order to evaluate the method,the true IP and geolocation mapping data from domestic ISP(Internet Service Provider)are used as the verification dataset.The experimental results showed that the accuracy of the fusion geolocation database is improved by 8.79%compared with the existing evaluation model.
Keywords/Search Tags:IP geolocation database, data fusion, delay similarity, active measurement
PDF Full Text Request
Related items