Font Size: a A A

Attribute Mining And Knowledge Graph Construction Based On Introduction Text

Posted on:2021-01-07Degree:MasterType:Thesis
Country:ChinaCandidate:Y F YangFull Text:PDF
GTID:2428330605476992Subject:Software engineering major
Abstract/Summary:PDF Full Text Request
At present,there is a lot of encyclopedia data in the Internet,which is generally dis-played to users in the form of entry web pages.The entry web pages mainly include modules such as entry introduction text and entry basic information tables.The entry introduction text is the descriptive text that takes the current entry as the object,and the entry basic information table is an attribute knowledge module in the form of a table.The entry basic information table has a neat format,but there are some flaws,such as the attribute value is not normal-ized,and the attribute name expression is diverse.Although the introduction text does not have a fixed format,it contains a wealth of knowledge that can be explored by us.We hope to combine the advantages of the two,and use the entry basic information table as the basic data.At the same time,more knowledge triples were mined from the intro text to enrich the knowledge base.The main contents of this thesis are as follows:(1)Research on attribute recognition based on hybrid strategy.We take the set of char-acter attributes as the experimental object,and use the method of distant supervision to auto-matically annotate the data to build a character attribute recognition system.According to the different characteristics of the attributes,rule-based attribute recognition and model-based attribute recognition methods are used.In the model-based attribute recognition method,the current task is transformed into a sequence labeling task.Experiments are performed with several mainstream sequence labeling models,and the system with the best performance is compared and constructed.The experimental results show that the attribute recognition work based on the hybrid strategy in this paper has a good recognition effect.(2)Research on alias recognition based on Bootstrapping and joint model.Unlike other regular attributes,an alias is a special entity attribute that we identify individually.This ar-ticle excavates the person alias and tourist attraction alias separately.Character alias mining uses distant supervision to build a dataset with the goal of accuracy.Using pattern iteration,a set of candidate aliases is mined,and we construct a classifier to determine the correctness of the mining results.In the travel scene,we use manual annotation to construct the data and focus on the in-depth study of scenic spots aliases.The tourist attraction alias mining mainly takes the value of F1 as the target,and uses the pipeline model and the joint model for experiments.After many experimental comparisons,we found that the joint model is more suitable for this task than the pipeline model.(3)Research on character knowledge graph construction based on encyclopedia data.We take the character field as an example to describe how to construct a character knowledge graph.Through the crawler,we obtained web page information from multiple encyclopedia sources.These data are integrated,cleaned,and attribute normalized,and entry entities of the type of character are mined.On this basis,the basic attributes of the character are defined,and these entity data are encoded and stored in the knowledge base.In addition,in combination with the attribute recognition work in this paper,the attribute completion and error correction work are performed on the current knowledge base,which has improved the coverage of the attributes of each person in the knowledge base.In the end,we built a knowledge graph display system for users to view information and so on.In summary,this article uses the encyclopedia data currently available on the Internet as the basic knowledge base.On this basis,the knowledge is combed and cleaned,and the attributes are mined from the intro text.By using different methods and different models,select the method model that is most suitable for different subtasks,complete and correct the knowledge base to continuously enrich the knowledge.Finally,taking the character field as an example,a character knowledge graph based on the knowledge base is constructed.
Keywords/Search Tags:Knowledge graph, Attribute recognition, Distant supervision, Neural Networks, Joint model
PDF Full Text Request
Related items