Font Size: a A A

Research And Implementation Of Content Oriented Web Page Classification

Posted on:2018-12-01Degree:MasterType:Thesis
Country:ChinaCandidate:D ZhangFull Text:PDF
GTID:2348330536979940Subject:Computer technology
Abstract/Summary:PDF Full Text Request
In the information age,the Internet has penetrated into all aspects of social life.Browsing the web becomes the main way of getting information.However,with the rapid development of the Internet,the number of web pages has increased rapidly.Facing massive and complex web pages,the correct information can not be quickly searched,and resulting in poor user experience.So how to classify web pages according to different themes as a research point.The classification process consists of two steps: web page data preprocessing and text classification.Web page data preprocessing is performed first after the data is acquired.Web page data preprocessing includes web denoising,Chinese word segmentation,feature selection and text representation.And then the resulting numerical data is entered into the classification model for text classification.Feature selection and text classification are improved.According to the characteristics of web content and structure,Bloom Filter and TF-IDF algorithm are improved respectively.In order to eliminate redundant feature items,the two improved algorithms are combined to form a feature selection scheme based on feature reduction.Support vector machine algorithm used to manage the large-scale data effectively is improved.According to the principle of kernel function,a new mixed kernel is constructed.Then the best parameters of the new mixed kernel are searched by the cross validation of genetic algorithm.So a support vector machine with both learning ability and generalization ability is setting up.And the simulation results show that the improved algorithm is superior than the traditional algorithm.Finally,a web page classification system based on improved algorithms is implemented.The system is used to classify web pages.The classification results verify that the algorithms proposed can achieve better classification effect in most cases,and the algorithms have some practical value in the application.
Keywords/Search Tags:Web page classification, Feature selection, Support vector machine, Kernel function, Genetic algorithm
PDF Full Text Request
Related items