Font Size: a A A

Research On Extraction And Classification Of Web Textual Geographic Information

Posted on:2018-09-19Degree:MasterType:Thesis
Country:ChinaCandidate:X S WuFull Text:PDF
GTID:2348330542972260Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
With the human society entering the era of large data,massive text information is flooding the entire Internet,as is textual geographic information.How to extract value data effectively and quickly from the information has become one of the most popular research directions.However,the disorganized text set has a low utilization value,and large-scale text delayed the speed of human reading.Therefore,as the basis of data mining,information retrieval and other fields,the technology of text classification and abstract extraction has gained the attention of lots of the experts.This paper summarizes the three characteristics of textual geographic information by analyzing a large amount of textual geographic information,namely,the topic is not clear,the connection between the paragraphs is weak and the frequency of key words appears low.Therefore,when the traditional abstract extraction method based on sentence similarity and word frequency is applied to the field of geographic information,the result usually misses a large amount of the important information.According to the unique characteristics of the textual geographic information,this paper has improved the classical TextRank algorithm by combining the key concepts in the field of geographic information,the key phrases and the key position of the sentences,and proposed the abstract extraction method based on GTextRank algorithm.At present,due to the scarcity of ready-made abstracts of network text geographic information,traditional abstract evaluation indexes such as accuracy rate,recall rate and F value can't be directly used to evaluate the performance of abstracting method.Therefore,the paper combined with the key concepts in the field of geographic information,the average percentage of key concepts,the average coverage and average M value are proposed and used to evaluate the performance of abstract extraction method.Text abstract content usually contains important information which is highly related to the text topic.Meanwhile,the semantic feature space of text not only has low dimensionality and low redundancy,but also takes into account the whole information of the text,so it can improve the performance of text classification.Therefore,this paper proposes a textcategorization method based on the collaboration of the text summarization and the text semantic,using the method mentioned above and the natural language processing technology.Finally,a series of contrastive experiments are carried out by selecting some recent research results as references.Experimental results show that the performance of the proposed method of text categorization,abstract extraction and evaluation have been further improved.
Keywords/Search Tags:Automatic abstracting, Text classification, Natural language processing, Similarity computation
PDF Full Text Request
Related items