Font Size: a A A

Research On Chinese WEB Text Classification Algorithm Based On Manifold Learning

Posted on:2012-07-28Degree:MasterType:Thesis
Country:ChinaCandidate:J Z LiFull Text:PDF
GTID:2218330362952638Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
Today, the Internet is high-speed prosperous and the information appear on the website every day increase at a speed of index. It has become urgent problem of the data mining workers to effectively manage and organize the information growing at a high speed, and quickly and accurately find the requested information from the vast amount of information . It is undoubted that Web text classification algorithm can solve this difficult problem, but face to the Web text who have high-dimensional, the performance of traditonal classification algorithms is not so good. Demension reduction is the key of solve this problem, high accuracy Web text classification algorithm and high efficiency of the dimension reduction algorithm can greatly improve the text classification accuracy and save the user's precious time. Web text classification will play an important role in digital library, search engines,information retrieval etc ,so the application prospect is very promising. This paper mainly focuses on the study of Chinese Web based on manifold text classification algorithm , the main contents are as follows:(1) Firstly introduces the advantage and deficiency of current mainstream drop peacekeeping classification algorithm in the application of the Web text categorization, and analyze the broad prospects in manifolding algorithm in text reduced-order fields have, put forward the conception that applying manifold study in Web text. At the same time, this paper introduces the superiority of manifold learning algorithm in dealing with nonlinear data when and emphatically introduces several manifold learning algorithm (Isomap, LLE and MDS), and experimental data proved their efficiency in extracting high-dimensional data embedded in the low dimensional structure. As to Chinese web page pre-treatment process ,deep discussions are launched to expound the web pretreatment process.(2) Secondely,put forward a Chinese Web text algorithm that based on the manifold learning algorithm:first dimension reduction by manifold learning algorithm, then classification. And process ISOMAP learning algorithm, and by using the traditional high-dimensional data classification algorithm classification. compare the efficiency and accuracy reduced-order around classifier , based on the best estimate of the manifold learning parameters, increase classification efficiency in condition of not breaking the accuracy of premise of multiple(3) Finally,to get the parameters and constructed of classifier assessed, from dimension of selecting, precision and the recall-precision changes, comprehensive comparative classifier performance, to determine the optimal parameters and optimal path. Put forward a new evaluated standard of Classifier: (HF1-T)value, The (HF1-T)value certificate that there has a great improved on the classifier which dimension reduction by ISOMAP and then classify.And put forward the conception of retrieval device based on the manifold learning network information .
Keywords/Search Tags:Manifold Learning, web text classification, web pretreatment, evaluated standard, ISOMAP
PDF Full Text Request
Related items