Font Size: a A A

Research On Key Technologies About Geolocation Analysis Of Network Endpoints

Posted on:2024-06-12Degree:DoctorType:Dissertation
Country:ChinaCandidate:R X LiFull Text:PDF
GTID:1528307100473394Subject:Cyberspace security
Abstract/Summary:PDF Full Text Request
The network endpoint refers to physical devices connected to the Internet,such as mobile devices,desktop computers,and servers.The technology of analyzing the geographical location of network endpoints is to establish the relationship between network endpoint and reliable geographical location by obtaining,inferring,and evaluating geographical data.Analyzing the geographical location of network endpoints can be used to determine the geographical location of endpoint users,which is widely used in strengthening network security management,increasing business profits,and improve social governance efficiency.This technology is with significant practical and theoretical value.However,due to sparse location data and abundant false information in massive network data,unclear mapping relationship between network endpoint’s attributes and geographical location,difficulty in measuring mobile endpoint’s IP,and sparse historical location points of endpoint devices,obtaining the reliable and high-precision geographical locations of network endpoints,especially mobile ones,under non-cooperative conditions is quite difficult.Existing network endpoint geographical location analysis methods mainly focus on obtaining,evaluating,and accessible IP-based geolocating the geographic location of fixed network endpoints.Under non-cooperative condition,the existing methods can only obtain a certain number of reliable and high-precision geographical locations of fixed network endpoints,and these existing methods are difficult to obtain the reliable and highprecision geographical locations of mobile network endpoints.In this thesis,we focus on the key technologies of analyzing network endpoint geographical location under non-cooperative conditions.Our main contributions are as follows:1.The existing methods are easily restrained by anti-crawling mechanisms,and can only obtain a limited number of fixed endpoints with reliable geographical locations.For that,we proposed a method for obtaining a large number of reliable and high-precision geographical locations of fixed network endpoints.First,we used a machine learning algorithm to find those IPs with Web,Email,or other services,and obtained these IPs’ geolocation by anti-DNS and associating server IP with insititutional geographical location and established the map between server and geographic location further.Then,in the basis of router-level network topology,a new network topology(cluster-level network topology)was constructed by merging twin routing nodes with similar functions and close locations.Finally,based on the cluster-level topology,we evaluated the reliability of <server endpoint,high-precision geographical location>.The experiments were conducted in 18 cities worldwide,including Beijing,Tokyo,New York,and Lagos.And the experimental results showed that the machine learning algorithm can obtain more fixed endpoints with high-precision geographical locations than existing webpage-based methods.Moreover,compared with IP-level topology and router-level topology,the evaluation method based on cluster-level topology can select more reliable data pairs from the obtained candidate<fixed endpoint,high-precision geolocation> data pairs.2.The existing methods are vulnerable to network connectivity,making it difficult to geolocate those unreachable mobile endpoint IPs with high precision.For that,we proposed a mobile network endpoints’ IP geolocation method based on district anchors.Firstly,we given the concept of district anchor,and proposed a two-stage clustering algorithm(IPG2C algorithm)based on IP and geographic location to obtain the district anchors.Then,a reliability evaluation mechanism for district anchors was established based on the number,IP distribution and spatial distribution of the in-cluster elements.Finally,based on the obtained reliable district anchors,the "subnet geolocation" strategy was used to determine the geographical location of the mobile IP.The experiments were conducted in cellular networks in 10 cities,including Beijing,Shanghai,Shenzhen,etc.And the experimental results showed that the district anchor obtained by clustering algorithm can be used to geolocate mobile IPs with high precision.Among the clustering algorithms,the minimum geolocation error of the district anchors obtained by our IPG2 C algorithm was only 13 meters,and the average geolocation error was 12.47 kilometers.Comparing with the 13 classic clustering methods,such as K-Means++,DBSCAN,and GMM,the average geolocation error of our IPG2 C algorithm was reduced by 26.62% to 50.77%.3.The existing methods identify the brand and model of mobile network endpoint device with low accuracy,making it difficult to be used to evaluate the stability of mobile network endpoint geolocation.For that,we proposed a stability analysis method about mobile network endpoint geolocation based on brand and model identification.Firstly,we constructed a feature set containing 20 fixed attributes,such as GPU model,resolution,and operating system.The features in set were divided into two types: string features and numeric features.And for each type features,we designed the similarity calculation rules.Then,the feature importance measurement strategies were designed to quantify the role of each feature in identification,and the weights of features were determined accordingly.Finally,the corresponding features of the target mobile endpoint were extracted,and we calculated the similarity between the target device and the knowledge-based mobile endpoint device to determine the brand and model of the target mobile endpoint.The experiments were conducted on 587 models of 17 brands,such as Apple,Samsung,Xiaomi,etc.And the experimental results showed that the accuracy of our method is higher than existing methods in identifying the brand and model of mobile endpoints.Even when using only features that can be extracted from application-level traffic,the average accuracy of brand identification and model identification was 97.51% and 82.91%.In addition,we analyzed the relationship between the type of device and the geographic location stability of mobile network endpoints.And the analysis results showed that the stability of geographic location of mobile network endpoints varies with different brands and models.It could be conclusion that identifying the brands and models of mobile network endpoints could be used in analyzing the stability of geographic location.4.The existing methods hardly extract the movement patterns from sparse historical GPS points,making the inferring error larger.For that,we proposed a geographic coordinate inference method based on the sparse historical GPS points.Firstly,we set an expiration threshold to determine the effiency of historical GPS points,filtering out the invalid ones.Then,based on the geographical distances between effective GPS points and the time difference between the point’s acquisition time and the inferring time,we established the spatial characteristic measurement equation and temporal importance measurement equation,and determined the weights of each historical location point in inferring time further.Finally,a geographic coordinate inferring model was established.We used the model to infer the coordinate of the mobile network endpoint.And evaluated the reliability of the inferred coordinates based on the number,spatialtemporal distribution of historical GPS points,and the brand of mobile endpoint.The experiments were conducted on the simulation dataset,Geolife trace dataset,and Weedend check-in dataset.And the experimental results showed that our method can infer the geographical coordinates of mobile network endpoints at any time in the future based on sparse historical GPS points,and the inference error is smaller than that of three typical existing methods.The average inference error on the check-in dataset is less than 1.8 kilometers.Finally,we summary our thesis,and present the prospects of future work.
Keywords/Search Tags:Geolocation Acquisition and Evaluation, IP Geolocation, Geolocation Inference, Cluster-level Network Topology, Endpoint Identification
PDF Full Text Request
Related items