Research Of Internet-Based Information Extraction Technology

Posted on:2006-08-23

Degree:Master

Type:Thesis

Country:China

Candidate:Y J Li

Full Text:PDF

GTID:2168360152975691

Subject:Computer application technology

Abstract/Summary:

PDF Full Text Request

With the development of Internet, the Internet has become one of the most important knowledge repositories. It is highly desirable to achieve efficient information extraction. It has become an important research issue of how to offer efficient information automatically from Internet to the users. The information extracted by IE (Information Extraction) systems not only can provide for the end user, but also is the first step to build an intelligent query system and a data mining system. The IE system has a nice prospect, and the research on IE technique becomes the focus of Natural Language Processing internationally.This paper presents the history, key technologies, difficulties and evaluation standards of information extraction, reviews the state of Internet information extraction, and compares kinds of foregone Internet information extraction technology synthetically.A new technique for supervised wrapper generation is proposed in this paper. It assists the user to semi-automatically create wrapper programs by providing a fully visual and interactive user interface. Neither manual fine-tuning nor knowledge of the internal language is necessary. In this convenient user-interface very expressive extraction programs can be created. The user can work directly and solely on browser-displayed example pages. With this system, very expressively visual wrapper generation is possible: It allows to extract target patterns based on surrounding landmarks, on the contents itself, on HTML attributes, on the order of appearance and on semantic and syntactic concepts.Using Maximum Entropy (ME) model to conduct Chinese chunk parsing is proposed in this paper. Firstly it defines Chinese chunks and lists all chunk categories and tags used in the model. Then, it discusses how to select useful features. At last, it introduces the procedure and algorithms of feature selection. This paper uses a set of extraction patterns to locate specific information and relations among different information items automatically. It combines the XML technique with database technique to construct the information database, which further improves the performance of the system. It also gives detailed introductions and descriptions on web information expression.Based on theoretical analysis, the paper designs and implements the practical system of SBIES (the Sham Battle Information Extraction System). It also gives detailed introduction on the system. At last it tests the model, and gives experimental results.

Keywords/Search Tags:

Internet Information Processing, Information Extraction, Maximum Entropy Principle, Pattern Matching

PDF Full Text Request

Related items

1	Application Of Maximum Entropy In The Data Processing Of Soft Sensor
2	The Application Of Information Entropy In Machine Learning Algorithm
3	Maximum Entropy Method And Its Applications In Natrual Language Processing
4	Research On Extracting Chinese Entity-relationship Based On Maximum Entropy Model
5	Research On The Text Information Extraction Of Author Relevant Information
6	Information Automatic Extraction And Analysis Approach On Financial Sector
7	Chinese Terminology System Design And Implementation Based On Maximum Entropy
8	Research On Structured Information Extraction Based On Pattern Matching
9	Research On Information Measurement System And Some Problems
10	Research On Technology Of Table Information Extraction In Semi-Structured Texts