Font Size: a A A

Research On The Text Information Extraction Of Author Relevant Information

Posted on:2017-12-16Degree:MasterType:Thesis
Country:ChinaCandidate:Z LiuFull Text:PDF
GTID:2348330485460024Subject:Software engineering
Abstract/Summary:PDF Full Text Request
Information extraction is a base task of Natural Language Understanding, Text Mining and Data Analysis.As the reading time coming,it comes up with new requirements to information extraction in the book review field.The author information is an important factor,so information extraction of author becomes more and more significant.The main points in this paper are as follows:(1) Analysis the multiple data sources of author information according to the data requirements.In order to improve the speed and precision,redesign the web crawler program by comparing the data collection method.The data collected is the data base of author evaluation research.(2) Reduce the data noice by data cleasing mothod.Determine the data cleasing architecture. Clean out the irrelevant data, errors data and repeated data by Drools architecture.(3) Find out the best method of information extraction by analysing and comparing comprehensive IE technology. Improve and optimize the method.Also come up with a new method which is based on ME-HMM and ontology technology.Then design a standardized data format to reduce the noise and improve the availability of data.(4) Design and implement the Author Information Extraction System(AIES) which consists of four-layer structure based on Data Collection,Data Cleasing and Information Extraction. Compared with the extraction efficiency of the information extraction system based on Hidden Markov model through the extraction of real data, the system can get a significant improvement in accuracy rate, recall rate and F value.Based on comprehensive evaluation of book review in the information extraction research and practice process,this paper verifies the feasibility, robustness and stability of the AIES.It is presented that the author information extraction method in this paper can meet the new demands of the modern information extraction.
Keywords/Search Tags:Data Extraction, Ontology, Maximum Entropy, HMM
PDF Full Text Request
Related items