Font Size: a A A

Research Of Web Text Mining Technology Based On Hidden Markov Model

Posted on:2008-07-09Degree:MasterType:Thesis
Country:ChinaCandidate:L M ZouFull Text:PDF
GTID:2178360218953463Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
With the development of the network techniques, the information on Internet increases quickly and shows the features of mass, different-structure, dynamic, how to find the potential, useful knowledge has become a new research direction. The Web text mining is the technique of finding information and knowledge, extracting information and knowledge automatically from Web documents and services using data mining technology, during the processing of network information, Web text mining is an important method that speeds up and increases the accuracy rate of finding information.The paper introduces the common techniques, classifications of Web mining, it expounds the process of Web text mining, the text characteristic expression and extraction, the text information extraction, classifications, clustering, associational rule and so on, then it introduces the representative algorithms. After comparing different machine-studying methods, this paper puts forward the Web text mining method based on Hidden Markov Model (HMM). The paper introduces the collection of experiment training dataset, the basic composing of HMM, the three questions and representative algorithms of HMM. Based on the marked training dataset it accomplishes the HMM's construction with MaxinumLikelihood algorithm, after deep parsing the paper items in experiment dataset it extracts different domain information of testing dataset successfully and the experimental results show that this method is feasible.For the un-marked training dataset, the paper puts forward the Web text mining method based on genetic algorithm and Hidden Markov Model. The method constructs HMM with Baum-Welch algorithm. Baum-Welch algorithm itself is a grade-descended training algorithm, the problems of local optima and sensitive to initial parameters are existed in the training of HMM. To reduce the influence to the recognition processing, the paper uses genetic algorithm, it modifies the basic genetic algorithm considering the features of Web text and presents a GA-HMM Model, the model improves the HMM's training efficiency through finding global optima of HMM's initial parameters with genetic algorithm. Comparing the experiments results, the paper draws a conclusion that the method based on GA-HMM has better performance.
Keywords/Search Tags:Data Mining, Web Text, Hidden Markov Model, Maximum Likelihood, Genetic Algorithm
PDF Full Text Request
Related items