Research On Extraction And Classification Of Web Textual Geographic Information

Posted on:2018-09-19

Degree:Master

Type:Thesis

Country:China

Candidate:X S Wu

Full Text:PDF

GTID:2348330542972260

Subject:Computer Science and Technology

Abstract/Summary:

PDF Full Text Request

With the human society entering the era of large data,massive text information is flooding the entire Internet,as is textual geographic information.How to extract value data effectively and quickly from the information has become one of the most popular research directions.However,the disorganized text set has a low utilization value,and large-scale text delayed the speed of human reading.Therefore,as the basis of data mining,information retrieval and other fields,the technology of text classification and abstract extraction has gained the attention of lots of the experts.This paper summarizes the three characteristics of textual geographic information by analyzing a large amount of textual geographic information,namely,the topic is not clear,the connection between the paragraphs is weak and the frequency of key words appears low.Therefore,when the traditional abstract extraction method based on sentence similarity and word frequency is applied to the field of geographic information,the result usually misses a large amount of the important information.According to the unique characteristics of the textual geographic information,this paper has improved the classical TextRank algorithm by combining the key concepts in the field of geographic information,the key phrases and the key position of the sentences,and proposed the abstract extraction method based on GTextRank algorithm.At present,due to the scarcity of ready-made abstracts of network text geographic information,traditional abstract evaluation indexes such as accuracy rate,recall rate and F value can't be directly used to evaluate the performance of abstracting method.Therefore,the paper combined with the key concepts in the field of geographic information,the average percentage of key concepts,the average coverage and average M value are proposed and used to evaluate the performance of abstract extraction method.Text abstract content usually contains important information which is highly related to the text topic.Meanwhile,the semantic feature space of text not only has low dimensionality and low redundancy,but also takes into account the whole information of the text,so it can improve the performance of text classification.Therefore,this paper proposes a textcategorization method based on the collaboration of the text summarization and the text semantic,using the method mentioned above and the natural language processing technology.Finally,a series of contrastive experiments are carried out by selecting some recent research results as references.Experimental results show that the performance of the proposed method of text categorization,abstract extraction and evaluation have been further improved.

Keywords/Search Tags:

Automatic abstracting, Text classification, Natural language processing, Similarity computation

PDF Full Text Request

Related items

1	Design And Implementation Of Clickbait News Detect System Based On Natural Language Processing
2	Research On Text Classification Based On Natural Language Processing And Machine Learning
3	Research And Application Of Text Classification Based On Natural Language Processing
4	Semantic Similarity Computation And Application For Text Based On HNC Theory
5	Intelligent Device Text Classification Method Based On Natural Language Processing
6	The Research Of HowNet Based Word Similarity Computation And Its Application
7	Research On Text Similarity Based On Bert
8	Research And Application Of Short Text Similarity Algorithm Based On Semantic Dependency Tree
9	Text Classification Based On Natural Language Processing, Analysis And Research
10	Research On Text Representation Model And Application In Text Classification And Natural Language Inference