Research On IP City-level Geolocation Based On Random Forest

Posted on:2021-03-16

Degree:Master

Type:Thesis

Country:China

Candidate:Y Q Lei

Full Text:PDF

GTID:2428330620963462

Subject:Applied Statistics

Abstract/Summary:

PDF Full Text Request

Since the 21 st century,the Internet developing rapidly,has become an indispensable tool for people in their daily lives.With the popularity of the Internet,online services and network communications have become a trend.Personalized push services on the Internet,such as targeted advertising,automatic selection of web languages,real-time local news push,and traceability tracking of network security issues,all require IP geolocation technology,which determines geographical location according to each network host's unique IP address.Although there are many excellent IP geolocation technologies,there are more or less limitations,such as the low accuracy of network measurements and the inability to accurately measure the relationship between variables..Therefore,this paper mainly proposes an IP city-level geolocation method based on data mining.This method uses the IP address itself as features and uses a random forest algorithm to train a classifier to obtain a good prediction result.This paper studies and analyzes the existing classic IP geolocation methods,pointing out their shortcomings,and proposes an IP city-level geolocation model based on random forests.First,in the model design,in order to obtain a high-precision IP training set,a data fusion of different source databases is proposed,and a database fusion algorithm is introduced that introduces a heap structure.The algorithm mainly focuses on the attribute fusion of each database's IP records.In the experiment,two different combination methods of databases were selected.Through comparative analysis,it was found that the results of the second group of experiments were better.The province information in the specific group could be identified,and the city recognition rate was increased by 19 times.Secondly,this article extracts IP data from 13 cities in Hubei Province of the same operator in the new fusion database as training samples,and discards the traditional methods that use network measurement information such as delay and hop count as features.Four bytes of IP address are used as the four characteristics of model training to generate a single decision tree classifier and a random forest classifier.Experimental results show that the random forest model is better than the decision tree model to a certain extent,and its prediction accuracy rate reaches up 97.89%.Finally,the research method of comparative analysis finds that it is feasible to perform machine learning classification with IP itself as a feature.In addition,in the localization method based on domestic IP data,the random forest algorithm is better than Naive Bayes Algorithm to a certain extent.

Keywords/Search Tags:

IP geolocation, IP database, feature, random forest

PDF Full Text Request

Related items

1	Prediction Of Road Traffic Concentration Using Random Forest Algorithm Based On Feature Compatibility
2	Research On Random Forest Algorithm Based On Feature Selection And Diversity
3	Research On IP City-level Geolocation Based On Network Topology Clustering
4	Research And Implementation Of Domestic Ipv6 Geolocation
5	Research On Feature Selection Method Based On Random Forest
6	Research On Detection Of Abnormal Mobile Communication Users Based On Improved Random Forest
7	Research On IP Address Location Technology Based On Neighbor Relationship
8	Facial Expression Recognition Based On WMCBP-WWEF Feature Fusion Using Random Forest
9	Research On Feature Selection And Classification Method Based On Random Forest For Medical Datasets
10	Random Forest Feature Selection