Humanities and social sciences research is an essential tool for studying and understanding human society,culture,and history.In recent years,with the rapid development of big data and artificial intelligence technologies,combining AI technology with internet-scale data has gradually become a hot topic in humanities and social sciences research for knowledge discovery.The Silk Road is a famous ancient trade route that connected China with the Eurasian continent,influencing world history,culture,and economics.However,the available data related to the Silk Road primarily exists in semi-structured and unstructured textual formats,making manual data extraction costly.This research focuses on the application of natural language processing techniques and deep learning models to automatically extract Silk Road-themed knowledge and construct a Silk Road knowledge graph.The main research contents include:(1)Research on Named Entity Recognition(NER)Methods: Preprocessed Silk Road literature is annotated with named entities using a combination of precise string matching and manual labeling,resulting in a standardized dataset containing 13 types of named entities and 35,810 entities.A BERT-IDCNN-Bi LSTM-CRF model is employed,where BERT pre-trained model encodes the input text,and a hybrid model of IDCNN and Bi LSTM extracts local and contextual features.Finally,a CRF model imposes global constraints on contextual information for named entity recognition.Comparative experiments are conducted with Bi LSTM-CRF,IDCNN-CRF,Bi LSTM-Attention-CRF,BERT-CRF,BERT-Bi LSTM-CRF,and BERT-IDCNN-CRF models on MSRA,People’s Daily,and Silk Road datasets.The F1 scores of the BERT-IDCNN-Bi LSTM-CRF model on the three datasets are 94.90%,94.14%,and 90.98%,respectively,outperforming other models.(2)Research on Entity Relation Extraction Methods: Entities are labeled with relations using a combination of rule templates and manual annotation,resulting in a standardized dataset containing seven types of relations and 4,095 relation instances.A joint relation extraction framework based on BERT and attention mechanism is proposed,dividing relation extraction into three modules: entity extraction,relation extraction,and entity semantic recognition.The entity extraction module employs BERT-IDCNN-Bi LSTM for encoding and extracting entity start and end positions.The relation extraction module encodes a custom relation set using BERT and calculates potential relations in sentences based on attention mechanism.The entity semantic recognition module enumerates candidate triplets in the sentence and identifies the semantics(subject or object)of entities.In the Silk Road dataset,this method achieves precision,recall,and F1 scores of 81.13%,80.46%,and 80.79%,respectively.Furthermore,in-depth exploration is conducted on overlapping relation extraction and recognition of triplets with varying sentence structures.(3)Construction of the Silk Road Knowledge Graph: The relation triplets extracted through relation extraction and manual annotation are used to compute entity similarity based on Word2 Vec cosine similarity and Levenshtein distance,facilitating the matching of entities with similar meanings.This process results in 2,421 entities and 6,157 relation instances,which are stored in a Neo4 j database.Finally,a Silk Road knowledge graph question-answering system is built,enabling automatic Q&A and knowledge graph query functionalities. |