Font Size: a A A

Application Research Of Hidden Markov Model In Information Extraction

Posted on:2008-03-25Degree:MasterType:Thesis
Country:ChinaCandidate:Y N WangFull Text:PDF
GTID:2178360242467569Subject:Software engineering
Abstract/Summary:PDF Full Text Request
Although the auto market of China is prosper at present, but with the increasingly competition between major automobile manufacturers, each car's price keep falling. New cars' profits become obviously smaller, thus some of the major automobile manufacturers pay more attention to professional services, especially the development of the current online services in the field of automotive technology. Information extraction technology is involved in such as automatic for users with more information and the purchase of auto-related knowledge, and some other advanced features, so improve the application of information technology from this discussion becomes the starting point and focus of this paper.The subject of this paper is the study of current not yet mature online automotive service system, which includes a web site as interface for interaction with users, a crawler process customized and an information extractor model used to get useful information from the information after pretreatment and show them to the users. This paper focuses on the field of the application of Hidden Markov Model (HMM) to information extraction. At first introduces the composition and the main learning algorithm of HMM, including the forward-backward learning algorithm, Baum-Welch algorithm and Viterbi algorithm, in the abstract shows second-order HMM (HMM2) which not only consider the current state through the expansion of two premise of HMM's hypothetical. This model can improve the overall learning ability and the accuracy of information extraction.A service website based on DWR + Tomcat architecture is built. At first introduces the website's background and the whole functional structure, then focus on the information extraction module functional requirements and design. In process of HMM's application to this module, taking into account the improvement mainly in the following four aspects: the value of the unknown observation smooth handling and improvement, clustering of string, horizontal and vertical combination of special state and using rules for information extraction.This paper introduces the development configuration and the method of pages collection and pretreatment in last segment. After explaining the foundation and training of HMM, the result of the text after handling of HMM is shown. At last the experiment of the improved smoothing algorithm and improved HMM between traditional algorithms shows the amelioration of new system.
Keywords/Search Tags:Hidden Markov Model, Information Extraction, Auto service website
PDF Full Text Request
Related items