Font Size: a A A

Research On High-tech Vocabulary Relation Extraction Model Based On The Combination Of Feature Vector And Kernel Function

Posted on:2020-07-20Degree:MasterType:Thesis
Country:ChinaCandidate:Q R ChenFull Text:PDF
GTID:2428330578952875Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
In the era of big data,with the further development of the artificial intelligence technology,various types of data existing in the Internet are becoming more and more.These data include discrete unstructured fragmented and semi-structured data that exists in an open knowledge base.In this case,the network users or researchers must first reorganize the data before using it,that is,transform the data into structured and easy-to-use data through information extraction technology.The transformed structured data can be used to construct knowledge graphs or embedded in large database systems to improve the efficiency of information retrieval,and to present knowledge to users in a more meta-element way,which greatly improves the user experience.Information extraction technology mainly includes the extraction of knowledge of entities,entity attributes,and relationships between entities.Among them,entity relationships are of great significance for building domain knowledge graphs.Under the vision of "the Silk Road Economic Belt and 21st-Century Maritime Silk Road(B&R)",the teaching of Chinese as a foreign language for the countries along the Belt and Road is becoming more and more important.In this thesis we focuses on the extraction of semantic relations between Chinese named entities,mainly doing the following work:(1)Crawled up to 12 types of national high-tech related vocabulary under "the Belt and Road" strategy from multiple data source websites.For the two traditional entity relationship extraction approaches based on the supervised method:the feature vector-based entity relationship extraction and kernel function-based entity relationship extraction,the appropriate relationship instance expression patterns are constructed respectively,and the relationship extraction performance under the two different methods for the predefined eight entity relationship categories are analyzed.(2)Aiming at the shortcomings of the above traditional Chinese entity relation extraction methods,an improved relationship extraction hybrid model is proposed.In the improved model,planar features and structural features of the relationship instances are weighted and combined.The experimental process is based on SVM algorithm and multiple cross-validation.The method improves the classification performance,and confirms the effectiveness of the proposed improved model by analyzing the relationship extraction effects under different weight ratios.(3)Using the improved extraction model proposed in this thesis,the entity relationship triples are extracted for the related vocabulary of the twelve high-tech vocabulary,and the corresponding knowledge graphs is constructed.The extracted relational triples are applied to the digital media library of the Chinese language teaching system in the countries along the "Belt and Road",and can be used to provide teaching cases with richer content and more diverse forms.
Keywords/Search Tags:the Belt and Road, Entity Relationship Extraction, Knowledge Graph, Convolution Tree Kernel Function, Feature Extraction
PDF Full Text Request
Related items