Font Size: a A A

Design And Implementation Of Large-Scale Open Information Extraction System

Posted on:2018-06-23Degree:MasterType:Thesis
Country:ChinaCandidate:G F LuoFull Text:PDF
GTID:2348330536460846Subject:Software engineering
Abstract/Summary:PDF Full Text Request
The World Wide Web contains a significant amount of information and knowledge expressed using natural language.Information Extraction(IE)is the task of mapping textual content into a structured knowledge base.The traditional information extraction systems use rules matching methods to extract specific knowledge from a small collection of domain-specific text.These systems have many limitations,such as to obtain new knowledge,needs to design a new extraction rules.The task of extracting information from the Web presents several challenges for existing IE systems.Therefore,it is necessary to design and implement an open information extraction system,which can extract open categories of entities,relations and linkings from open domain text resources.This thesis presents Open Information Extraction,a system that completes the information extractions such as named entity recognition,entity relationship extraction and entity linking extraction.The system is composed of 3 modules:(1)extraction task management;(2)open information extraction;(3)extraction result management.The first module includes the functions of data uploading,task publishing,task starting and task real-time monitoring.The second module starts with extracting the effective text from web page,and then extracting the named entities,relations and linkings from the text.The last module is responsible for the persistence and visualization of the information extraction results.According to the software development process,this thesis introduced the system in detail as follows: the system's requirements analysis,design and implementation,and testing.The system is developed and implemented on the basis of a series of open source Natural Language Processing tools.The system uses message queues and multi-thread technology,can process large text corpora concurrently and can extract thousands of named entities,relations and linkings.In addition,the system also implements some common Chinese text processing functions,such as word segmentation,part-of-speech tagging and keyword extraction and so on.The system provides a good visual user interface,the user can be more intuitive understanding of the results of the Chinese text processing.
Keywords/Search Tags:Open Information Extraction, Named Entity Extraction, Entity Linking
PDF Full Text Request
Related items