Research And Implement Of The Technology For Finding Specified Domain Attributes And Values

Posted on:2019-03-26

Degree:Master

Type:Thesis

Country:China

Candidate:Y X Shen

Full Text:PDF

GTID:2428330545951212

Subject:Computer technology

Abstract/Summary:

PDF Full Text Request

With the rapid developments of mobile Internet,Internet of Things,cloud computing and other technologies in recent years,network applications emerges one after another.Data produced by these applications witnessed explosive growth.Facing such a large amount of data,how to derive valuable knowledge and make full use of these data with deep calculation and analysis become a hot research topic.Currently,these applications produce mass data everyday which contains a large amount of text data and the development of Artificial Intelligence relies heavily on understanding these text data.Open Information Extraction targets at extracting structured information from free text.Knowledge Base plays an important role.This thesis contributes to realizing the extension of domain knowledge base by extracting domain attributes and attribute values from the text corpus automatically.We research on extracting structured data from text and implement the extraction system called DAVE.Specifically,our work covers the following several aspects:1.In data collection aspect,we design and implement a web data collection framework consists of a web crawler which downloads specified domain web pages with multithreads and extract domain text corpus based on the page features and a text filter which can filter the texts unrelated to the interest of specified domain with keywords pattern and machine learning.We use the framework to save the text in database.2.We propose an effective graph-based iterative extraction approach based on the cooccurrence between attribute terms and attribute value terms in the same sentences.We could perform this process iteratively until no more attributes and values could be identified.3.Besides,a CNN-based model is also developed to remove noises from the extraction results.The model introduces some features of nodes in cooccurrence graph to improve extraction quality,such as degree of nodes,random walk score,features of adjacent nodes.The thesis study on the structured data extraction of text corpus,propose an algorithm to find new attributes and values of specified domain to extend the initial KB and implement a prototype that reach a high extraction quality.The DAVE makes contributions in practical aspect.

Keywords/Search Tags:

Open Information Extraction, Knowledge Base, Domain Attributes, Conventional Neural Network, Data Collection

PDF Full Text Request

Related items

1	Research On Open Domain Question Answering Based On Knowledge Base
2	Design And Implementation Of Resume Information Extraction Ystem Based On Domain Knowledge Base
3	Neural Network-based Open Information Extraction And Its Application
4	Design And Realization Of Domain Specific Knowledge Base Extraction Syste
5	Clause Based Open Domain Information Extraction
6	Design And Building Of The Domain-specific Knowledge Base System For Internet Videos
7	Building A Relation Knowledge Base For Open Information Extraction
8	Research And Design Of Key Technology Of The Domain-specific Knowledge Base System For Vertical Searching
9	Construction Of System Engineering Documents Based Domain Knowledge Base
10	Design And Implementation Of Chinese Knowledge Engineering And Knowledge Service Platform