Font Size: a A A

Research And Implementation Of Similarity Computation For Abstracts Of Scientific Papers

Posted on:2024-02-20Degree:MasterType:Thesis
Country:ChinaCandidate:T J WangFull Text:PDF
GTID:2544307055497914Subject:Computer technology
Abstract/Summary:PDF Full Text Request
The text similarity metric is used to indicate the semantic similarity size of the text.The number of scientific papers has increased dramatically in recent years,and scientific papers are highly relevant,rigorous,paragraph-length,highly procedural,etc.,with complex and rigorous statement structures,containing rich domain knowledge,and highly specialized.The abstract section of a paper condenses the information content of the main research methods,theories,techniques,and research processes,which can express the main research content information of a scientific paper and provide researchers with directions for efficient and fast access to useful content.In this paper,the abstracts of scientific papers are used for text similarity calculation research.In this paper,a text similarity computation model of graph convolution and twin convolutional neural network is constructed with the dataset of scientific papers in the field of new crowns,and the experimental results show that good results are achieved in the text similarity of scientific paper abstracts in this field.The specific work is described in the following aspects:(1)Construction of COVID-19 scientific knowledge map.In this paper,we use the dataset of "COVID-19" papers,international top medical journals,core journals of Chinese medicine and scientific literature of western medicine treatment from the Key Laboratory of Intelligent Computing for Data Science of Yunnan University.The data are pre-processed to build the ontology of the abstract text,and the ontology and entity definitions are used for joint knowledge extraction,introducing the concept hierarchy and domain knowledge to construct the COVID-19 scientific knowledge graph triad,and finally the knowledge graph is stored using Neo4 j.(2)Label-free and generalized processing of COVID-19 scientific knowledge graphs.First,the knowledge graph is initialized with representation to retain the structural and semantic information of the knowledge graph to obtain the initialized embedding vector;next,the knowledge graph is initialized with relation to assign the representation vector to nodes;the relation is assigned to edge nodes;finally,the knowledge graph is generalized with relation to obtain the unlabeled COVID-19 scientific knowledge graph.(3)Text similarity calculation.Combined with the knowledge graph embedding technique,entities and relations are embedded into a continuous vector space,preserving the original structure of the knowledge graph.Finally,graph convolutional neural network and twin convolutional neural network are constructed for feature extraction of the unlabeled COVID-19 scientific knowledge map,and the text feature vector is obtained by learning the feature representation of the knowledge map to calculate the similarity,and in the experiment,Drop_node is used to suppress the overfitting phenomenon of the graph convolutional neural network.
Keywords/Search Tags:Knowledge Graph, Knowledge graph embedding, graph convolution, twin networks, Text similarity calculation
PDF Full Text Request
Related items