Font Size: a A A

Semi-supervised Web-page Classification And Its Application In Directory-style Search Engines

Posted on:2009-09-23Degree:MasterType:Thesis
Country:ChinaCandidate:J N TanFull Text:PDF
GTID:2178360275951032Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
With the rapid development of information networks,the search engine including directory-style search engine has become an important tool for information retrieval.However,directory-style search engines rely on the editorial staff to classify web page,which result in many defects that are low efficiency of the training,less information and information can't update in timely.In addition,there are a large number of samples without labels and in opposition to samples with labels,how to use these samples to build a classifier has become a key issue in the study of web-pages automatic classification.Research of web-page semi-automatic classification used in directory-style search engines has high academic value and great practical significance.Paper discussed the advantages of web-page semi-supervised classification technology,research purpose and meaning,introduced the study of the situation at home and abroad,to resolve problems that are class skew,difficult to confirm proportion of category in samples without labels for TSVM algorithm,and so on,combined with data fusion theory and fuzzy clustering theory paper presented semi-supervised learning hypertext classification algorithm based on fuzzy clustering.The main achievement in our work is listed here:1.Recalling some traditional text feature extraction methods,analysis and realize several typical feature extraction methods. 2.To solve the problems that features of web-text are used to cause class skew and be high dimension,using method of data fusion,presented a web-text feature extraction method based on adaptability data fusion.3.To solve the problem that for TSVM algorithm,it is difficult to confirm proportion of category in samples without labels,research methods of fuzzy clustering,presented a semi-supervised classification method based on fuzzy clustering(FC_TSVM),and used informations of page links as an important basis for classification.4.Designed and implemented a directory-style search engine based on semi-supervised learning hypertext classification algorithm,realized web-text feature extraction method based on adaptability data fusion and semi-supervised classification method based on fuzzy clustering,which presented in paper.
Keywords/Search Tags:search engine, feature extraction, web-page classification, hyperlink, data fusion, fuzzy clustering, transductive Support Vector Machines
PDF Full Text Request
Related items