Font Size: a A A

Research And Application Of Knowledge Extraction For Government Affairs

Posted on:2022-08-16Degree:MasterType:Thesis
Country:ChinaCandidate:T T HuFull Text:PDF
GTID:2516306530980749Subject:Computer technology
Abstract/Summary:PDF Full Text Request
With the rapid development of emerging technologies such as big data and artificial intelligence in recent years,the construction concept of digital government has gradually become hot,and it has become a trend to use emerging technologies to promote the construction of intelligent government and service-oriented government.With the development of the information society,a large amount of data has been accumulated in various fields,and the amount of data in the field of government affairs has also increased rapidly.These data contain very rich information resources and have great value for government governance.However,these data are large,complex,scattered,low value density and weak connection.How to quickly obtain valuable information from a large number of complex information resources has become one of the problems people need to face in the process of promoting the construction of intelligent government.For example,a large number of text data related to goverment affairs are published on the portal websites of various government departments,including the functions of government departments,news trends,work deployment,policy documents and other information,which is an important data source of government governance.However,the information in these government websites is often released according to time series(such as meetings,visits,speeches,etc.),and most of the data are long text type.For these information,the general way for the public to obtain them is to take the list way.Users can not quickly obtain these key information from the long text.In this way,the user experience is not good and the utilization rate of information resources is low.Therefore,there is an urgent need for a way to transform the unstructured text content in the field of government affairs into a structured form,to describe the concepts and the relationship between concepts in the field of government affairs concisely and clearly,and to liberate people from the way of reading a large number of texts to obtain knowledge.Knowledge extraction technology is an excellent technology to extract special entities,attributes and events from a large number of heterogeneous data,which has the characteristics of highly structured knowledge organization.In view of the problems of scattered data in the government affairs field,low efficiency and incomplete public information acquisition,this study studies the Knowledge extraction technology in the government affairs field,including Named entity recognition technology,Event trigger word extraction and Event argument role extraction technology.And extracts information such as entities and events from unstructured data,the extracted information is stored in the graph database to form the event knowledge graph,and on this basis,an intelligent retrieval system for government knowledge is constructed,which provides the foreground knowledge retrieval function and the background knowledge maintenance function.The main work and innovation are as follows:Through Python crawler,we can get more than 40,000 news contents from target websites,including conference reports,research visits,and other aspects,and construct the database of government affairs.On this basis,the category and trigger word list of meta events are defined,and some data are annotated.Because RNN is difficult to achieve parallel,it has high time complexity in processing long sequences,and the Transformer model will produce up-down fragmentation when extracting the direct relationship between long-distance words.We use Transformer-XL to fuse CRF to design TF-CRF model for Named entity recognition.In order to reduce the number of annotated data as much as possible and improve the performance of the model by using unlabeled data,a three-stage network training method is innovatively adopted.In the first stage,unsupervised training is carried out on the general corpus to learn the basic rules of the general language;in the second stage,the political data is used for unsupervised training to learn the language characteristics of government affairs;in the third stage,supervised training was carried out on the government data to learn the named entity recognition of government data,and the result was better than the baseline model.In view of the problem that error propagation and information flow cannot be established due to the separate execution of trigger word extraction and argument extraction(pipelining method),we design a joint prediction model(JPM : Joint Prediction Model)based on Transformer-XL that can extract the roles of trigger word and argument simultaneously.The weight value of named entity recognition model TFCRF established is used to initialize EE-JPM.After fine-tuning EE-JPM with annotated data,the accuracy of inference of this model is improved compared with the pipeline model executed separately.After identifying knowledge such as entities and events,Neo4 j graph database is used to store the results of entity extraction and event extraction into structured storage to construct event knowledge graph in the government affairs field.Then,based on the event graph,an intelligent retrieval system of government affairs knowledge is constructed.This system can not only complete the routine search,but also display the structured knowledge of important query results of events,such as people,institutions,conferences,etc.,as well as conduct correlation query analysis on these knowledge.
Keywords/Search Tags:Deep learning, entity recognition, event extraction, knowledge acquisition, knowledge mapping
PDF Full Text Request
Related items