Font Size: a A A

Research On Internet Map Classified Geographic Information Detection Algorithm Based-on Spark

Posted on:2020-10-16Degree:MasterType:Thesis
Country:ChinaCandidate:Y S CuiFull Text:PDF
GTID:2370330590471492Subject:Information and Communication Engineering
Abstract/Summary:PDF Full Text Request
With the rapid development of information technology,the types and manifestations of geographic information carriers in the network are more and more abundant,and the transmission channels are more diverse.At the same time,the security of geographic information has always been a key issue of national concern.However,there is a large amount of information in the network that poses a serious challenge to the detection of classified geographic information.Therefore,this thesis proposes Internet map sensitive geographic information detection method based Spark,which can reduce the supervision pressure of map reviewers and improve the detection efficiency of sensitive geographic information,which is of great significance for maintaining national geographic information security.The main research contents of this thesis are as follows:Aiming at the problem that there is a large amount of geographic information in the Internet,and the existing sensitive geographic information detection method is single and the detection accuracy is poor,a method for detecting geographic information of Internet maps is proposed.For the location of the map and its ancillary information in the Internet,firstly,the Ansj algorithm is used to extract the feature word set of geographic information.Secondly,construct a single and combined sensitive vocabulary,calculate the similarity between the characteristic words and the sensitive words through the word similarity method,and extract the sensitive words in the feature word set.Then,the characteristics of sensitive geographic information are analyzed,and the location attribute of the feature word in the text,the weight of the feature word in the text,and the sensitivity coefficient of the geographic information corresponding to the feature word are extracted.Finally,according to the similarity of the extracted sensitive information words and their corresponding characteristics,the geographic information sensitivity is calculated to detect whether there is sensitive geographic information in the Internet map.The simulation results show that the proposed algorithm has higher accuracy,recall rate and F metric than existing sensitive geographic information detection methods.Due to the large amount of data in the network and the rapid dissemination of information,therefore,the efficiency of sensitive geographic information detection algorithms is highly demanded,aiming at the above problems,a Spark-based parallel sensitive geographic information detection method is proposed.Firstly,construct a parallel sensitive dictionary and broadcast the sensitive dictionary to each node in the cluster.Secondly,the geographic information stored in the HDFS system is converted into a Resilient Distributed Dataset,and the feature word set is parsed through the parallel operation of the Map operation.The word similarity between the feature word and the sensitive word is calculated at each node,and the sensitive word in the feature word set is extracted.Then,extract corresponding features of the feature words in the geographic information,and calculate geographic information sensitivity at each node.Finally,the results are summarized to the drive node to calculate the entire map file sensitivity and stored in the HDFS distributed system.The experimental results show that the parallelized sensitive geographic information detection algorithm has a significant improvement in operating efficiency compared to the stand-alone mode,and the node execution in the cluster has the characteristics of easy expansion.
Keywords/Search Tags:sensitivity calculation, sensitive geographic information, Internet map, Spark, big data
PDF Full Text Request
Related items