Font Size: a A A

Research On The Mining Technology Of Text Geographic Information In Web

Posted on:2017-11-21Degree:MasterType:Thesis
Country:ChinaCandidate:S J WangFull Text:PDF
GTID:2348330518470793Subject:Software engineering
Abstract/Summary:PDF Full Text Request
Geographic information has very important role in civil use, commercial use and national defense, acquisition of geographic information is limited by many aspects.Currently, much geographic information exists in the Internet, acquisition of geographic information through web and breaking through limits of traditional geographic information acquisition method has become an important means. However, web data is numerous and data type is complicated, it is very difficult to obtain geographic information from web, in order to solve above problem, the paper researches acquisition and classification of geographic information.The paper puts forward a kind of theme web crawler algorithm,which is combined with geographic information ontology base, it assesses relevancy of web contents by building geographic information ontology base; in the meanwhile, it assesses along with web link filteration and web link authority, selects web geographic information. The experiment result shows that the algorithm put forward in the paper is able to effectively filter webpages uncorrelated with geographic information, and increases accuracy of geographic information web acquisition.The paper puts forward a kind of nearest neighbor classification algorithm integrating distance threshold for classification of geographic information, this algorithm classifies types of samples through set distance threshold, according to type core and spatial distance of samples to be classified. The experiment result shows that the algorithm put forward in the paper is able to effectively classify geographic information, and accuracy of classification is high. At the same time, using the classification results verify the Apriori algorithm can get the mining association rules of the geographic information.Finally, it uses theme web crawler algorithm and nearest neighbor classification algorithm to realize web-oriented text geographic information mining system. The system compares web text with ontology in geographic information ontology base, in order to assess relevancy. It selects and acquires web text which has high relevancy of geographic information, pre-treats it and extracts characteristics of web text, uses characteristics collection of web text to convert web text to spatial vector, classifies and treats that. By comparing the words of basic geographic information, obtaining the text information to extract the information from the required place. With the Apriori algorithm realizes extraction of the association rules of the geographic information. The result of the system test shows that the designed web geographic information mining system in this paper has implemented the functions that acquisition of web texts, classification of web texts,extraction of text information and association rules mining of the geographic information.
Keywords/Search Tags:Geographic information, Geographic ontology, Theme web crawler algorithm, K-nearest neighbor classification algorithm
PDF Full Text Request
Related items