Research And Implementation Of The Construction Of Chinese RDF Knowledge Base

Posted on:2017-02-08

Degree:Master

Type:Thesis

Country:China

Candidate:J F Huang

Full Text:PDF

GTID:2308330485977480

Subject:Software engineering

Abstract/Summary:

People can obtain abundant information from the Big Data on the Internet, and they only need to put the keywords into the search engines to get the relevant news and data links. However, it is inefficient for people to acquire knowledge and information when facing the Big Data that continues increasing. Currently the information on the Internet are stored and published through the documents that associated with the hyperlinks. This way can make the people understand the information in the document, but it is hard for computer to understand the meaning of it. In order to make use of the Big Data with a better way, some foreign research institutions have built the knowledge bases from the English Wikipedia, such as FreeBase, DBPedia, etc. There are also some knowledge bases in China, such as Baidu knowledge base, Sogou knowledge cube and Tsinghua Xlore. The knowledge base has an important value in the field of knowledge graph, data fusion and artificial intelligence question answering. Foreign knowledge bases such as FreeBase provide the public resource description framework (RDF) data resources, but they have little Chinese entity information. The research on building a high quality Chinese RDF knowledge base has become a hot research field.Based on the above background, the methods of constructing a high quality Chinese RDF knowledge base is studied in this thesis, and the work is carried out in the following aspects:(1) The technology of Web crawling for large-scale online encyclopedia is studied, and the specific problems and challenges of Web crawling are analyzed. An online encyclopedia data crawling system is constructed which is combined with the Scrapy framework and the Spring MVC framework. The performance of the crawling system is stable and has a good user interface. Then a proxy IP address automatic extraction algorithm is proposed which can extract proxy IP address effectively and solve the anti-crawling problem.(2) The technology of online encyclopedia entity information extraction is studied, and the method of semantic annotation for the extracted information is proposed through RDFS information and RDF data standardization. Then the RDF data storage method based on graph database is studied, and a RDF data storage system based on NEO4J is developed. Compared with the traditional relational database storage, the experimental results show that the system can meet the requirements for large RDF data storage and SPARQL query.(3) The problem of entity alignment encountered in constructing the Chinese RDF knowledge based on Baidu encyclopedia and Hudong encyclopedia heterogeneous data sources is studied. Then a method of entity alignment based on entitiesâ€™attributes and the features of context topics is proposed. A comparison of the proposed approach with several traditional entity alignment methods show that it is superior to the existing entity alignment methods.(4) Combined with the technology of large scale online encyclopedia data crawling, the method of RDF data transformation, storage and SPARQL query of entity information and the method of entity alignment based on heterogeneous data sources, the Chinese RDF knowledge base automatic building system is designed and implemented. The system can automatically download the online encyclopedia data by configuring the web crawling task, extract the entity information, standardize the RDF data and store the RDF data into graph database. The system can provide the function of the entity information retrieval and SPARQL query for external applications.

Keywords/Search Tags:

knowledge base, resource description framework, web crawling, information extraction, graph database, topic feature, entity alignment

Related items

1	Research On Multi-Feature-Based Entity Alignment Method For Knowledge Graphs
2	Research And Implementation Of Entity Alignment Technology Based On Multi-modal Knowledge Graph
3	Research And Application Of Entity Alignment Technology Based On Association Information Modeling
4	Research On Entity Alignment Method Based On Joint Robust Knowledge Graph And Attribute Fusion
5	Research On The Method Of Entity Alignment With Attribute Enhancement Based On Graph Convolution
6	Knowledge Graph-based Knowledge Acquisition System In Open Space
7	The Design And Implementation Of Domain Knowledge Base Management System Based On Knowledge Graph
8	Research On Information Extraction Method For Knowledge Graph Construction In Industrial Field
9	Study On Key Technologies Of Entity Alignment Between Knowledge Graphs In Open Environment
10	Research On Entity Alignment Method Of Knowledge Graph In Domain Of Social Insurances And Housing Fund