| Remote sensing image application cases are the carrier of remote sensing image application case knowledge,and their quantity directly determines the efficiency of task driven remote sensing image discovery.The academic literature on remote sensing image application records a large amount of knowledge on remote sensing image application cases,but they are all unstructured data and have no publicly annotated corpus.Therefore,how to automatically extract the entity composition of remote sensing image application cases from academic literature of remote sensing image application is a key issue in building a large-scale remote sensing image application case knowledge base.This article uses publicly available academic texts as the data source to extract entities from remote sensing image application cases.The main work includes the following three parts:(1)Analysis of Academic Text Features of Remote Sensing Image Application Case KnowledgeAnalyze the title,abstract structure,element composition,and entity description features of academic texts on remote sensing image applications.Summarize the element composition and text description features of the title,abstract structure composition and element distribution features,as well as the text description and contextual features of entities in the title and abstract of remote sensing image application cases.(2)Title-oriented unsupervised remote sensing image application case naming entity recognition methodAiming at the characteristics of concise titles and limited constituent elements of academic texts,this paper proposes an unsupervised method for entity recognition of title naming of remote sensing images.Aiming at the characteristics of enumerability of remote sensing image data entities,the remote sensing ontology and its characteristics are used to enhance the samples,and on this basis,the C-LSTM discriminator is constructed,and then combined with the scoring and sorting method of N-gram to realize the identification of remote sensing image data entities.Second,according to the characteristics of more standardized description of place and time in the title,the Stanza model is used for initial identification,and then the recognition results are modified in combination with rules to realize the identification of temporal and spatial entities.Third,aiming at the problem that there are many unregistered words of tasks and methods in the title,the context-based recognition rules are constructed by using their context characteristics to realize the recognition of task and method entities.(3)Abstract-oriented remote supervised remote sensing image application case naming entity recognition methodAiming at the characteristics of clear structure but diverse description forms of academic text abstracts,this paper proposes a remote supervised remote sensing image abstract naming entity recognition method.Firstly,aiming at the problem of insufficient coverage of annotated dictionaries,the construction of extended dictionaries is realized by making full use of the recognition results of titles,artificial knowledge base and description characteristics of remote sensing image application case entities.Second,in view of the problem that the remote supervision standard contains data noise,the CLSTM discriminator is first constructed on the extended dictionary to realize the preannotation of the data,then the abstract and entity characteristics are used to correct the label,and finally the noise data is screened by rules and corrected by relying on manual correction.Third,the Bi LSTM-CRF model that introduces the early stop mechanism is trained with labeled data to realize the recognition of entities. |