Research Of Web Text Mining Technology Based On Hidden Markov Model

Posted on:2008-07-09

Degree:Master

Type:Thesis

Country:China

Candidate:L M Zou

Full Text:PDF

GTID:2178360218953463

Subject:Computer application technology

Abstract/Summary:

PDF Full Text Request

With the development of the network techniques, the information on Internet increases quickly and shows the features of mass, different-structure, dynamic, how to find the potential, useful knowledge has become a new research direction. The Web text mining is the technique of finding information and knowledge, extracting information and knowledge automatically from Web documents and services using data mining technology, during the processing of network information, Web text mining is an important method that speeds up and increases the accuracy rate of finding information.The paper introduces the common techniques, classifications of Web mining, it expounds the process of Web text mining, the text characteristic expression and extraction, the text information extraction, classifications, clustering, associational rule and so on, then it introduces the representative algorithms. After comparing different machine-studying methods, this paper puts forward the Web text mining method based on Hidden Markov Model (HMM). The paper introduces the collection of experiment training dataset, the basic composing of HMM, the three questions and representative algorithms of HMM. Based on the marked training dataset it accomplishes the HMM's construction with MaxinumLikelihood algorithm, after deep parsing the paper items in experiment dataset it extracts different domain information of testing dataset successfully and the experimental results show that this method is feasible.For the un-marked training dataset, the paper puts forward the Web text mining method based on genetic algorithm and Hidden Markov Model. The method constructs HMM with Baum-Welch algorithm. Baum-Welch algorithm itself is a grade-descended training algorithm, the problems of local optima and sensitive to initial parameters are existed in the training of HMM. To reduce the influence to the recognition processing, the paper uses genetic algorithm, it modifies the basic genetic algorithm considering the features of Web text and presents a GA-HMM Model, the model improves the HMM's training efficiency through finding global optima of HMM's initial parameters with genetic algorithm. Comparing the experiments results, the paper draws a conclusion that the method based on GA-HMM has better performance.

Keywords/Search Tags:

Data Mining, Web Text, Hidden Markov Model, Maximum Likelihood, Genetic Algorithm

PDF Full Text Request

Related items

1	Research On Hidden Group Detection Technology In Financial Transaction Network
2	Analysis And Mining Of Wiki Entry Editor Behavior Based On Hidden Markov Model
3	Algorithm Research For Text Information Extraction Based On Hidden Markov Model
4	Research On Signer Adaptation In Chinese Sign Language Recognition
5	Research And Application Of Key Technologies Based On Hidden Markov Prediction
6	Hidden Markov Model Based On Genetic Algorithm And Its Application In Evidence Fusion
7	The Algorithm Research Of Chinese Information Extraction Based On The Hidden Markov Model
8	The Study Of CT Statistical Reconstruction Algorithm Algorithm Based On Maximum Likelihood And Likelihood And Penalized Likelihood Estimates
9	Super-Resolution Reconstruction Of Images And Video Sequences
10	Research On The Multi-Level Based Security Audit System