Font Size: a A A

Research On IP Address Location Technology Based On Neighbor Relationship

Posted on:2021-08-11Degree:MasterType:Thesis
Country:ChinaCandidate:Y CaiFull Text:PDF
GTID:2518306512987359Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
With the continuous development of Internet technology,the number of Internet users has grown rapidly.Because a large number of network services require precise positioning of users,the research on IP geolocation is becoming increasingly important.IP geolocation,that is,determining the geographical location of a network device based on its IP address,is usually achieved by querying an existing IP address database or using an IP geolocation algorithm.At present,the data quality of domestic existing IP address databases is not the same,and most of them have the problems of low positioning precision and large deviation.On the other hand,traditional IP geolocation algorithms have low positioning accuracy,high model complexity,and are difficult to put into practical use.Based on this,this thesis uses the IP address data in Jiangsu Province to design and implement an IP geolocation algorithm based on neighbor relationships,which aims to achieve district and street level positioning of IP addresses by combining machine learning and spatial theory.The specific work includes:(1)Building a multi-source fusion IP address database.Due to the poor positioning accuracy of domestic mainstream IP address databases,and the data and formats between different IP address databases are different,this thesis comprehensively considers the reliability and coverage of the IP address database,and first builds a multi-source fusion IP address database of Jiangsu Province.(2)Extracting multi-dimensional features of IP addresses.Based on the constructed IP address database,by analyzing the characteristics of the IP addresses and the relationship between the IP addresses,combined the data obtained by active measurement,the IP address features were extracted to provide a data basis for subsequent research.(3)Proposing an IP city-level geolocation algorithm based on random forest.Aiming at the shortcomings of the traditional naive Bayes IP geolocation algorithm that only considers population density,this thesis proposes to use the decision tree to model the feature parameters extracted earlier.Further,in order to solve the problems of only local optimum and easy to overfit in a single decision tree,random forest algorithm is introduced and the parameters are optimized,and an IP city-level geolocation algorithm based on random forest is proposed.Experiments show that this algorithm is similar to the traditional algorithm in terms of time consuming,and the accuracy of positioning is greatly improved.(4)Proposing an IP geolocation algorithm based on neighbor relationships.Aiming at the shortcomings of the traditional delay-based IP geolocation algorithm that only considers the delay,this thesis combines the aggregation characteristics of IP address allocation and the characteristics of the network topology to define the neighbor relationship of IP addresses,and then proposes an IP district and street level geolocation algorithm based on neighbor relationships.The algorithm first determines the city where the point to be measured is based on the IP city-level geolocation algorithm,and selects a set of reference points that are closer to the point to be measured through neighbor relationships within the same city,and finally determines the geographic location of the IP address by means of spatial geometry.Experiments show that this algorithm can reduce the positioning error of some IP addresses to within 2 kilometers.At the same time,compared with traditional algorithms and mainstream IP address databases,the positioning precision is higher,and the districts and streets where the IP addresses are located can be better determined.
Keywords/Search Tags:IP geolocation, Active measurement, Machine learning, Random forest, Multiple regression
PDF Full Text Request
Related items