Font Size: a A A

Research And Implement Of The Technology For Finding Specified Domain Attributes And Values

Posted on:2019-03-26Degree:MasterType:Thesis
Country:ChinaCandidate:Y X ShenFull Text:PDF
GTID:2428330545951212Subject:Computer technology
Abstract/Summary:PDF Full Text Request
With the rapid developments of mobile Internet,Internet of Things,cloud computing and other technologies in recent years,network applications emerges one after another.Data produced by these applications witnessed explosive growth.Facing such a large amount of data,how to derive valuable knowledge and make full use of these data with deep calculation and analysis become a hot research topic.Currently,these applications produce mass data everyday which contains a large amount of text data and the development of Artificial Intelligence relies heavily on understanding these text data.Open Information Extraction targets at extracting structured information from free text.Knowledge Base plays an important role.This thesis contributes to realizing the extension of domain knowledge base by extracting domain attributes and attribute values from the text corpus automatically.We research on extracting structured data from text and implement the extraction system called DAVE.Specifically,our work covers the following several aspects:1.In data collection aspect,we design and implement a web data collection framework consists of a web crawler which downloads specified domain web pages with multithreads and extract domain text corpus based on the page features and a text filter which can filter the texts unrelated to the interest of specified domain with keywords pattern and machine learning.We use the framework to save the text in database.2.We propose an effective graph-based iterative extraction approach based on the cooccurrence between attribute terms and attribute value terms in the same sentences.We could perform this process iteratively until no more attributes and values could be identified.3.Besides,a CNN-based model is also developed to remove noises from the extraction results.The model introduces some features of nodes in cooccurrence graph to improve extraction quality,such as degree of nodes,random walk score,features of adjacent nodes.The thesis study on the structured data extraction of text corpus,propose an algorithm to find new attributes and values of specified domain to extend the initial KB and implement a prototype that reach a high extraction quality.The DAVE makes contributions in practical aspect.
Keywords/Search Tags:Open Information Extraction, Knowledge Base, Domain Attributes, Conventional Neural Network, Data Collection
PDF Full Text Request
Related items