The Research And Implementation Of Web Text Classification That Use Table Information

Posted on:2009-08-10

Degree:Master

Type:Thesis

Country:China

Candidate:H X Gui

Full Text:PDF

GTID:2178360272463922

Subject:Computer application technology

Abstract/Summary:

PDF Full Text Request

With the Internet and other information technologies's development and being widely used, Web has become one of the most important approaches to obtain information.It is very urgent to find how to search and classify the document quickly and precisely from the huge information database.The technologies of extracting information from Web document and classifying Web text automatically are consided as essential components of the information process,and more and more people pay attention to them.Fist of all, this paper researches the technologies of Web information extraction and presents a new model that extracts information from tables of Web documents based on table structure. It is composed of table positioning module, table structure pretreatment module and table information extraction and refactoring module.This model extracts information from table according to Web table structure label and heuristic method rules of user-definition.Experimental results show that this model can be well applied on information extraction from tables of Web documents.Later on, We establish domain ontology with regard to the characteristic information of Web table by researching the technology of Web text classification and theory of ontology , and design a Two Times classification model to classify Web text .This model classifies test data by the approach of classification based on Support Vector Machine in the first classification . As regards test data whose categories aren't confirmed, we extract the characteristic information of Web table from them and match similarity with classification model based on domain ontology in second classification.Finally, we compare Two Times classification model with Support Vector Machine classification model in the experiments , find that the precision-rate and recall-rate are improved significantly, proving the validity of this model.

Keywords/Search Tags:

information extraction, classification of Web text, heuristic method rules, domain ontology, similarity matching

PDF Full Text Request

Related items

1	Automatic Generation And Applications Of Ontology
2	Information Filtering Technologies Based On Heuristic Rules And Text Classification
3	Ontology-Based Structured Information Extraction From Web Pages
4	A Research On Chinese Information Extraction Based On Construction Of Domain Ontology
5	The Research Of Ontology Matching Based On Text Classification
6	Heuristic rules for extraction of ontology from Web pages in WebOntEx
7	Adaptive Web Information Extraction Method Research Based On Ontology
8	Construction And Implementation Of Domain Ontology Based On Plain Text
9	Research On Domain Ontology Construction And Fine-grained Entity Classification Methods Based On Sparse Labeling
10	Research On Text Classification Based On Domain Ontology