Font Size: a A A

Research On Knowledge Graph Construction Technology For Tables In Electric Power PDF Documents

Posted on:2024-01-01Degree:MasterType:Thesis
Country:ChinaCandidate:R ZhangFull Text:PDF
GTID:2542307124460334Subject:Electronic information
Abstract/Summary:PDF Full Text Request
The power industry is an important pillar of China’s economy,and its industry standards are usually saved as PDF documents.The power industry standards PDFs often include a large amount of tabular data,the main purpose of these data is to present information in a more intuitive way to people.However,the data in the tables is not easy to be process by computers,which makes it difficult to fully utilize the data’s value.Traditional string-based fuzzy matching queries not only fail to meet users’ personalized needs,but also often include a lot of irrelevant information in the search results.Therefore,this thesis focuses on the construction of a knowledge graph for power industry standard tabular data.We extract the data from the table in PDF document and convert it into structured RDF knowledge graph.For the more,we construct the power knowledge question-answering framework based on the method of information retrieval to simplify the management and query process.The main contents of this thesis includes the following two aspects:(1)A bottom-up domain knowledge graph construction method for tabular data in power industry standard PDF documents is proposed.Firstly,Tabula is used to extract tables from PDF documents.Then,the Cell Rule Language is used to normalize all tables and further transform tables into triples.Finally,a power domain ontology is constructed based on power industry standard documents,and the table data is mapped to ontology classes through table interpretation tasks to enrich the background knowledge of the table data.Through experiments on the national power industry standard dataset,an RDF knowledge graph on power industry standards is obtained,which contains 13,400 triples and 20 related basic concept classes.(2)A question-answering method of power knowledge graph based on information retrieval is proposed,and a question-answering framework is constructed.Firstly,topic entities of natural language questions are identified by part-of-speech tagging and dependency parsing and connected to the power knowledge graph.Then,a set of candidate answers is generated using the candidate path template.Finally,the semantic similarity between the candidate answers and natural language questions was obtained by using a representation-based BERT pre-trained model.The path score is combined with the text similarity,and the tail entity pointing to the candidate path with the highest score is taken as the correct answer.Additionally,an electric power question-answering dataset is constructed,which contains 3378 sets of questions and answers,to verify the validity of the electric power knowledge question-answering framework.The research in this thesis has a certain theoretical significance for extracting knowledge from tabular data.The power knowledge graph obtained according to the real data covers most of the structured knowledge in power standards,which has certain practical value for realizing the intelligent upgrade of the power related industry.
Keywords/Search Tags:knowledge graph, construction, table data, question answering
PDF Full Text Request
Related items