Design And Implementation Of Large-Scale Open Information Extraction System

Posted on:2018-06-23

Degree:Master

Type:Thesis

Country:China

Candidate:G F Luo

Full Text:PDF

GTID:2348330536460846

Subject:Software engineering

Abstract/Summary:

PDF Full Text Request

The World Wide Web contains a significant amount of information and knowledge expressed using natural language.Information Extraction(IE)is the task of mapping textual content into a structured knowledge base.The traditional information extraction systems use rules matching methods to extract specific knowledge from a small collection of domain-specific text.These systems have many limitations,such as to obtain new knowledge,needs to design a new extraction rules.The task of extracting information from the Web presents several challenges for existing IE systems.Therefore,it is necessary to design and implement an open information extraction system,which can extract open categories of entities,relations and linkings from open domain text resources.This thesis presents Open Information Extraction,a system that completes the information extractions such as named entity recognition,entity relationship extraction and entity linking extraction.The system is composed of 3 modules:(1)extraction task management;(2)open information extraction;(3)extraction result management.The first module includes the functions of data uploading,task publishing,task starting and task real-time monitoring.The second module starts with extracting the effective text from web page,and then extracting the named entities,relations and linkings from the text.The last module is responsible for the persistence and visualization of the information extraction results.According to the software development process,this thesis introduced the system in detail as follows: the system's requirements analysis,design and implementation,and testing.The system is developed and implemented on the basis of a series of open source Natural Language Processing tools.The system uses message queues and multi-thread technology,can process large text corpora concurrently and can extract thousands of named entities,relations and linkings.In addition,the system also implements some common Chinese text processing functions,such as word segmentation,part-of-speech tagging and keyword extraction and so on.The system provides a good visual user interface,the user can be more intuitive understanding of the results of the Chinese text processing.

Keywords/Search Tags:

Open Information Extraction, Named Entity Extraction, Entity Linking

PDF Full Text Request

Related items

1	Research And Implementation Of Named Entity Disambiguation Based On Wikipedia
2	Named Entity Linking Based On Multisource Knowledge
3	Research Of Entity Knowledge Base System Based On Information Extraction
4	Joint Extraction Of Named Entity Recognition And Entity Relationship Based On Neural Network
5	Research On Automatic Extraction Of Chinese Named Entities And Entity Relations
6	Research On Context Aware Entity Linking
7	Research On The Information Extraction System In Sports Domain
8	Research On Named Entity Relation Extraction Based On Web Text Mining
9	Entity Analysis with Weak Supervision: Typing, Linking, and Attribute Extraction
10	Engineering Construction Of Text Named Entity Recognition And Topic Extraction Based On Information Extraction Technology