Knowledge graph enables the massive and messy online information to be effectively organized and utilized.The vast amount of knowledge,structured and semi-structured content contained in online encyclopedia make it one of the most commonly used sources of knowledge when building knowledge graph.However,most of the existing online encyclopedia-based knowledge graphs only gain knowledge from the structured and semi-structured content of online encyclopedia,but ignore the huge amount of knowledge contained in the unstructured articles.Extracting values of a given entity's predefined attributes from unstructured text is the aim of slot filling task.The existing works of slot filling in knowledge acquisition is mainly based on relation classification methods,and can not fill slots that take an non-entity value.Data sets used in the existing related works are mainly built on news.There is still a lack of online encyclopedia articles-based slot filling data sets on the Internet.To solve the above problems,this thesis is focused on research on slot filling technique in online encyclopedia-based knowledge graph construction.Three aspects are involved:1.Methods to construct slot filling dataset using online encyclopedia articles.This thesis constructed a slot filling dataset according to the methods it proposed.The methods include a schema design method with user requirements and a page tool for data annotating.2.Slot filling methods for online encyclopedia-based datasets.This thesis proposed a slot value recognition model based on sequence labeling.To better model the online encyclopedia slot filling dataset and ensure the experiment effects,the model took document-level input,used a word's subject position to calculate attention and its sentence position to calculate label weights.3.Evaluations of the proposed methods' effectiveness.In this thesis,a series of compare experiments were designed to verify the effectiveness of the proposed method.The ability of filling slots with non-entity values and judging slot value boundary were evaluated too.As a conclusion,this thesis has proposed a slot filling algorithm for obtaining knowledge from online encyclopedia articles.The sequence labeling mode has enabled the algorithm to extract non-entity type slot values.Subject position attention,sentence position weight and document level scoring has ensured the algorithm's effectiveness on online encyclopedia dataset.This is the main innovation of this thesis.In addition,the dataset schema design method with user requirements this thesis proposed differs from the existing methods,which makes the dataset more practical.The slot filling dataset constructed in this thesis fills the gap in the related work's lack of on the online encyclopedia-based dataset. |