Font Size: a A A

Design And Realization Of Securities Knowledge Graph Construction System

Posted on:2022-06-05Degree:MasterType:Thesis
Country:ChinaCandidate:M C ZhuFull Text:PDF
GTID:2518306494981199Subject:Software engineering
Abstract/Summary:PDF Full Text Request
The knowledge graph is essentially a semantic network,which connects different kinds of information together to obtain a relational network.Domain knowledge graph,also known as industry knowledge graph or vertical knowledge graph,is an industry knowledge base composed of professional data for a specific field.It plays an important role in industry-oriented intelligent search,intelligent question and answer,intelligent recommendation,intelligence analysis,etc.effect.At present,domain knowledge graphs are mainly constructed with reference to the construction methods and steps of general knowledge graphs,and data collection,relationship extraction,data management,and data application are usually realized according to the requirements of domain knowledge.This article is oriented to the securities field.It collects data from securities announcements published on the Internet,builds a knowledge graph,explores financial relationships between corporate equity personnel,corporate securities products,etc.,and shows the development status of the enterprise to provide investment references for relevant practitioners.This paper focuses on the process and key technical methods of constructing a knowledge graph,establishing the ontology structure of knowledge based on securities announcements,designing system functions such as data collection,task management,knowledge acquisition,and data management,and using relevant laboratory tools to realize the knowledge graph The task of data labeling and relation extraction of triples will finally complete the development and deployment of the system.The main work of this paper is as follows.(1)Design and implementation of data collection task management.This article takes the Django framework as the core,writes data collection scripts based on the Selenium automated testing tool,and adds collection fault tolerance and collection data duplication avoidance mechanisms to ensure stable data collection.The system abstractly encapsulates the collection script as a collection task,and provides a unified management function for the task.The system implements the timing execution of tasks based on the Celery asynchronous task framework to obtain the latest field data of the data source,and solves the congestion problem of the Celery queue by realizing the dynamic update of the Celery routing,so as to realize the concurrent execution of the collection tasks.(2)The construction of domain knowledge ontology and the design and realization of relational triples.This paper uses spaCy pre-trained Chinese model to identify the named entity of the data set,uses the clustering tool to obtain the main entities of each type of entity,and constructs the domain ontology structure based on domain knowledge.Compared with the latest end-to-end relation extraction model,LTRel is selected as the tool of system relation extraction to realize knowledge extraction,and Neo4 j database is used to save relation triples.The system provides a knowledge query interface,and uses Echarts visual tools to visualize the query results.(3)ystem application deployment.Docker container technology is used to implement the deployment of the system on the server side.Analyze the system architecture under the development environment and clarify the services that the system depends on.The system deployment architecture was clarified,and the deployment environment preparations such as tool installation and Docker container division were completed.Use the Dockerfile file to realize the configuration and initialization of each container,and use the Docker-compose tool to start the configuration of multiple containers in the system.The above container-related configuration files or folders are added to the original system directory,which makes the directory structure under the system deployment environment different from the development environment.Use incremental deployment to deploy multi-container systems to solve compatibility errors encountered during system deployment.
Keywords/Search Tags:Knowledge graph construction, Knowledge ontology structure, Data collection, Task management, Relationship extraction, Docker deployment
PDF Full Text Request
Related items