Semi-supervised Web-page Classification And Its Application In Directory-style Search Engines

Posted on:2009-09-23

Degree:Master

Type:Thesis

Country:China

Candidate:J N Tan

Full Text:PDF

GTID:2178360275951032

Subject:Computer application technology

Abstract/Summary:

With the rapid development of information networks,the search engine including directory-style search engine has become an important tool for information retrieval.However,directory-style search engines rely on the editorial staff to classify web page,which result in many defects that are low efficiency of the training,less information and information can't update in timely.In addition,there are a large number of samples without labels and in opposition to samples with labels,how to use these samples to build a classifier has become a key issue in the study of web-pages automatic classification.Research of web-page semi-automatic classification used in directory-style search engines has high academic value and great practical significance.Paper discussed the advantages of web-page semi-supervised classification technology,research purpose and meaning,introduced the study of the situation at home and abroad,to resolve problems that are class skew,difficult to confirm proportion of category in samples without labels for TSVM algorithm,and so on,combined with data fusion theory and fuzzy clustering theory paper presented semi-supervised learning hypertext classification algorithm based on fuzzy clustering.The main achievement in our work is listed here:1.Recalling some traditional text feature extraction methods,analysis and realize several typical feature extraction methods. 2.To solve the problems that features of web-text are used to cause class skew and be high dimension,using method of data fusion,presented a web-text feature extraction method based on adaptability data fusion.3.To solve the problem that for TSVM algorithm,it is difficult to confirm proportion of category in samples without labels,research methods of fuzzy clustering,presented a semi-supervised classification method based on fuzzy clustering(FC_TSVM),and used informations of page links as an important basis for classification.4.Designed and implemented a directory-style search engine based on semi-supervised learning hypertext classification algorithm,realized web-text feature extraction method based on adaptability data fusion and semi-supervised classification method based on fuzzy clustering,which presented in paper.

Keywords/Search Tags:

search engine, feature extraction, web-page classification, hyperlink, data fusion, fuzzy clustering, transductive Support Vector Machines

Related items

1	The Study Of The Relevant Techniques In Fuzzy Support Vector Machines
2	Studies And Application Of Fuzzy And Double Regular Support Vector Machines
3	The Study And Implementation On The Key Problems Of Intelligent Search Engine Technology
4	The Research And Implementation Of Web Page Classification In Enterprise Search Engine
5	Research On Web Hyperlink Analysis And Its Application In Search Engine
6	Research And Application Of Image Classification Based On Transductive Support Vector Machines
7	Multi-Label Classification Based On Fuzzy Kernel Clustering And Fuzzy Support Vector Machine
8	Studies Of Some Problems In Support Vector Machines And Semi-supervised Learning
9	Chinese Web Page Classification Based On Web Page Features
10	Application Of Support Vector Machine And Fuzzy Theory For Remote Sensing Image Classification