Font Size: a A A

The Research On Web Information Extraction Based On HMM

Posted on:2006-06-11Degree:MasterType:Thesis
Country:ChinaCandidate:Y YuFull Text:PDF
GTID:2178360212460664Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
Web Information Extraction (WIE) is a valid data-processing process according to the user's extracting demand to form the corresponding structured data from lots of semi-structured web data. This paper made a research on the web information extraction's several key problems with HMM.This paper introduced the background of informatioin extraction and its history, and introduced evaluation standards and the typical system of Web information extraction.This paper introduced a clustering method of the data mining to merge states and therefore formed the HMM i.e producing HMM automatically according to the data after analyzing the success and shortcoming of the HMM's application in other fields. At the same time, we extended the common HMM. We built HMM for each extraction field for getting more useful information.For list-like information, this paper putted forward a new algorithm. In the experiment, the precision was contented.This paper introduced a improved smoothing method in the HMM possibility learning which successfully solve the parameter's problem in state's transformation.This paper used Model-View-Controller (MVC) design model in system designing and came it to true by asp.net technology. The system has two sub-system ie, data training and data testing. In the part of data training, we pre-processed and located the the web data. Then we clustered and merged the data to form the HMM structure.And then we computed the transformation possibility and emitting possibility. In the part of data testing, we pre-processed aned located the web data. Then we use the veterbi algorithm, the data dictionary and HMM's structure obtained from the data testing to compute the path and state to realize the web data extraction.
Keywords/Search Tags:Web information extraction, HMM, HTML structure tree, Clustering data
PDF Full Text Request
Related items