Font Size: a A A

Web Information Extraction Based On A Hybrid Of HMM/WNN

Posted on:2013-10-05Degree:MasterType:Thesis
Country:ChinaCandidate:S T LiFull Text:PDF
GTID:2248330374979219Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
A hybrid model was presented for information extraction by using the combinedHidden Markov Models(HMM) and Wavelet Neural Network(WNN).It firstcharacterize the node of the web and establish different HMM according to thecontent of the web. Then appropriate HMM is selected by WNN for informationextraction.As HMM can not extract important information accurately, WNN is used asan auxilary tool to do the discrimination. Experiments show that this hybrid modelcan improve the accuracy of Web information extraction.Hybrid model of informationextraction process is as follows:(1)Parse Web pages. Parse the page while using regular expressions technology,and characteristics of the Web node. This step of the characteristics of nodes needreference resources of the general environment, and the Web information to beextracted features to design a set of regular expressions and a set of characteristicsscheme. After characterization, all the Web pages can be mapped into a set ofeigenvalue sequences flow as the input of the hybrid model.(2)Build Wavelet network model. This article will establish three types of waveletnetwork model, in this paper they will be called WNN~1WNN~2, WNN~3. WNN~1used tocalculate the observation probability density of HMM.WNN~2used to select a specificHMM from the collection of the HMM,the selected HMM will be used in the pages ofinformation to be extracted. WNN~3used in the extraction when HMM cann’textraction information well.(3)Create a collection of HMM.Specific types of Web pages or Web block willcorrespond to the specific HMM. Before Information extraction, hybrid modelaccording to the Web environment and to be extracted information to establish a set ofthe HMM. The training phase of the model as well as practical information extraction process, if the existing HMM can not be extraction information well, the HMM statewill split node, automatically generate a new HMM.(4)Finally, the HMM with the WNN hybrid model for practical Web environmentinformation extraction experiment, and given the deficiencies of the hybrid modelbased on experimental results and areas for improvement.
Keywords/Search Tags:Information Extraction, HMM, WNN
PDF Full Text Request
Related items