Font Size: a A A

Research On Character Activity Elements Identification And Completion Techniques For Text Intelligence

Posted on:2020-09-19Degree:MasterType:Thesis
Country:ChinaCandidate:J WangFull Text:PDF
GTID:2518306548495934Subject:Management Science and Engineering
Abstract/Summary:PDF Full Text Request
With the development of big data techniques and mobile Internet,information spreads on the Internet in the form of electronic text,in which unstructured text information plays a major role.When people really need some information,how to extract the needed information from the massive unstructured text information has become a problem,which gives birth to the techniques of information extraction.This topic comes from the National Science and Technique Major Project,focusing on the practical needs of the military intelligence agencies for rapid and efficient information extraction of text intelligence.It plans to focus on the activities of characters,and carry out Research on Character Activity Elements Identification and Completion Techniques for Text Intelligence.This paper introduces the basic technical knowledge and module construction process involved in the construction of the System of Character Activity Elements Identification and Completion Techniques for Text Intelligence.The main work is as follows: firstly,a domain neural Chinese word segmentation method is proposed.This method uses mutual information and entropy to extract potential new words from text information.After getting the new words dictionary,it combines the term dictionary to segment the domain corpus.Finally,it uses Bi-LSTM-CRF model to train the segmentation materials to generate the word segmentation model.Then,a Chinese named entity recognition method based on pretrained word embeddings is proposed,and a military-political corpus with self-defined entity annotation is constructed.The best pretrained word embeddings of current Chinese so far,BERT-wwm,is introduced to fine tune the task of named entity recognition.Combined with Bi-LSTM-CRF model,we obtain the military-political domain entity recognition model with high generalization performance.The CN-DBpedia knowledge graph is introduced as the support of the system knowledge base to realize the function of entity alignment and information completion.According to the characteristics of the corpus,seven types of events reflected in the military-political texts are defined,the priority of all kinds of events is sorted,and the trigger vocabulary corresponding to the events is made to realize the event extraction based on the triggers.Finally,the back-end program of entity recognition module,entity alignment module,information completion module and event extraction module is integrated and packaged,and the front-end demonstration interface based on the Flask framework is developed.In the end,the paper completed the System of Character Activity Elements Identification and Completion Techniques for Text Intelligence,which can meet the functional requirements of intelligence personnel to input text intelligence from the visual interface and get feedback of various character activity elements,realizing the application landing.
Keywords/Search Tags:Character Activity, Neural Network, Elements Identification, Information Completion, Named Entity Recognition
PDF Full Text Request
Related items