| With the rapid development of the Internet,with the advent of the 5G era,the growth rate of the amount of information on online media has reached a new level.The problem of how to structure the huge and diverse Chinese text information data has become an important research content in the field of natural language processing.The entity relationship extraction task,as a basic task in the field of natural language processing,is of great research significance at present.For the research on entity relationship extraction of Chinese text,the thesis further optimizes the technology of text corpus preprocessing and the way of feature extraction of algorithm model.The main research work of the thesis is as follows:(1)The thesis proposes a text preprocessing algorithm based on anaphora resolution to solve the problem of entity relationship extraction caused by long Chinese texts,unclear text references,and too far distances between entities.By proposing a long text pruning algorithm based on anaphora resolution,the long text corpus is processed by anaphora completion and short sentence pruning,which enhances entity information and eliminates the influence of irrelevant text on relationship classification.By proposing the shortest dependency path algorithm based on the long text pruning algorithm,the dependency syntax analysis is performed on the pruned text,and finally the shortest path of the entity-related phrase nodes in the text is retained,further reducing the influence of other irrelevant information on the relationship discrimination.The thesis conducts comparative experiments on the Chinese data set to verify the feasibility and effectiveness of the algorithm proposed in the thesis.(2)The thesis proposes an entity relationship extraction model based on text long sentence pruning,which is used to deal with the problem of complex grammar and many references in Chinese long texts.The preprocessing of the text is to use the text preprocessing algorithm based on anaphora resolution,and extract a variety of text semantic features as auxiliary information.The embedding layer uses the BERT model to embedding word vectors for text phrases.The encoding layer uses bidirectional long The short-term memory network model extracts the semantic features of the text vector,and then weights the vector through the attention mechanism,and finally judges the relationship category in a multi-classifier.The thesis designs and conducts comparative experiments on the Chinese data set to verify the feasibility and effectiveness of the algorithm and model proposed in the thesis.(3)based on the proposed algorithm and model,the thesis designs and implements a knowledge map intelligent construction system,which realizes the integrated operation of entity relationship extraction,knowledge map construction,knowledge map display and other functions.The whole system has been tested functionally and non-functionally,and can be effectively operated and used. |