Font Size: a A A

Research On Categorization Of Textual Geographic Information From Web

Posted on:2018-06-03Degree:MasterType:Thesis
Country:ChinaCandidate:X LiFull Text:PDF
GTID:2348330542472263Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
Geography information plays an important role in many fields such as science,life,military and so on.With the rapid development of the Internet,we can obtain the geographic information from the network.Geographic information includes not only natural geographic information,but also human geography information,economic geography information and so on.However,the geographic information obtained from the Internet is enormous in amounts and disorganized.It is difficult to get what we need from the information directly.Therefore,the goal of this paper is to classify the textual geographic information obtained from the network and extract the text summarization based on the classification,so as to facilitate the storage and utilization of the textual geographic information.The research work of this paper is as follows:(1)Feature selection is an important step before text categorization.The traditional information gain method only considers the text distribution of the feature item in the text set,but does not consider the frequency of the feature item appearing in the text.In this paper we propose a method of adding word frequency parameter factor in feature selection.And according to the characteristics of geographic information,the geographic correlation factor is added to feature selection.By comparing the improved feature selection method and the traditional information gain method,the improved feature selection method is proved to be more effective.(2)The traditional k nearest neighbor classification algorithm consider the k nearest neighbor texts as the same without taking into account the weights of the nearest neighbor texts.In this paper we propose to use the artificial bee colony algorithm to calculate the k nearest neighbor texts’ optimal weights and add the simulated annealing algorithm to the artificial bee colony algorithm to prevent the artificial bee colony algorithm from getting into the local optimal solution.Then we compare the improved method with the traditional k nearest neighbor algorithm in classification precision and recall rate.The result proves that the improved k nearest neighbor method is better.(3)Automatic text summarization is based on the classification of textual geographic information.On the basis of traditional method of abstracting text summarization based on sentence features,we add judgment of geographical related words and the election model after scoring to sort the sentences and select the sentences as text summarization according to the sort results.By comparing the improved automatic text summarization method with the traditional method based on the sentence characteristics,it is proved that the improved method can extract the text summarization better.
Keywords/Search Tags:Textual geographic information, Feature selection, Text categorization, Artificial bee colony algorithm, Automatic text summarization
PDF Full Text Request
Related items