Font Size: a A A

Hmm Hybrid Model And Bp Neural Network-based Web Text Information Extraction Research

Posted on:2012-11-20Degree:MasterType:Thesis
Country:ChinaCandidate:H C YangFull Text:PDF
GTID:2208330335491376Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
With the development of internet,the web information is on the explosive growth.the disorder of information and people are not satisfid with their own to retrieve the necessary information.so on and so needs to promote the web information extraction in the text.But at present the main use of information extraction technology have some problems.such as poor self-adapter,statistical capacity is not strong,so this problems brought about the recall rate and the accuracy is lower.By analysis the above problems,this paper present a kind of method—the hybrid model,this model can better solve the above problems and improve the extraction quality.This paper describes the field of information extraction in the text using two main techniques: Hidden Markov Model (HMM) and the BP network, and analyzes the advantages and disadvantages of both, HMM is a good statistical model,this model have the superior temporal. Dynamic and excellent modeling capabilities have made the model successfully used in various fields but its poor adaptability, and requires a lot of training data.the BP network has good decision-making ability, and the capacity of description of uncertain information are strung。Since adaptability, but the timing of the model is not strong, and require specific input conditions。On the basis of above analysis we are research of how to combination the hidden Markov model and neural network model of the of information extraction to improve the current accuracy and recall. It is found that the two models complementary advantages and disadvantages, the combination of HMM and the BP network can be the ability to overcome the HMM in the classification and lack of adaptability, but also can make up for BP networks require specific input and the weak of modeling capabilities。The analysis of the previous extraction of information technology improvements, this paper uses the text block on the extraction method, the first artificial mark the text, then input the multi-state HMM training, the best state output probability trained HMM as the BP network input, to map the BP network, use of BP network classification strong ability to map the state of the text classification. Experimental results show that the hybrid model than the traditional BP network model or HMM classification accuracy on the ability to have about 15% improvement, the based on analysis of experimental results and the network structure, the network by BP algorithm is improved , makes the extraction process, the classification results is not clear and confusing state accuracy, there has been increased about 4%.
Keywords/Search Tags:Information Extraction, HMM, BPN, Hybrid model
PDF Full Text Request
Related items