Font Size: a A A

Research On Webpage Classification Algorithm Based On Deep Learning

Posted on:2017-09-26Degree:MasterType:Thesis
Country:ChinaCandidate:Q X ChenFull Text:PDF
GTID:2428330590468329Subject:Electronics and Communications Engineering
Abstract/Summary:PDF Full Text Request
Twenty-first centuries is the era of rapid development of the Internet,the information explosion followed,a variety of web pages filled with the network.People are unable to quickly find the website which meet their goals in this dazzling internet.The efficiency of information retrieval is getting lower and lower.Web classification system based on the content and information of the web page to determine the category.To establish of webpage classification system can not only effectively organize the webpage system,but also can standardize of web management system and effectively improve the efficiency of the people on the network to obtain information,filtering more accurate information and present to the user rapidly.Therefore,it is urgent to establish a system that can automatically classify web pages.In this paper,the main research is the process of web page classification based on deep learning algorithm.In this paper,we mainly study the webpage classification based on the deep learning algorithm to establish a webpage automatic classification system.Traditional Web classification methods such as KNN or SVM algorithm use the text content of webpages to classify.But the classification result of these algorithms will soon reach a bottleneck and the precision of the classification is generally very difficult to get improved.This paper puts forward the idea of applying deep learning algorithm on the webpage classification.The deep learning algorithm is a network structure composed of multilayer hidden layers.It mainly emphasizes the “deep” and “train by each layer” ideas,through a combination of low-level features to abstract high-level feature.And deep learning's most attractive point is that it can be fully automatic learning features which does not require manual to produce features and with these automatic generated features,the learning results are highly accurate.At present,deep learning is applied in many the field and has a good result.This paper will use deep learning algorithm in the field of webpage classification,through the experiments found that deep learning is also very effective for the classification of webpages as well.This paper combines the current webpage classification technology with the establishment of the text classification system and grabs a certain number of web data to complete the automatic webpage classification.First,choose and extract the appropriate page content information,cut the Chinese content,take the word as feature of the webpage,through an appropriate feature weighting method to construct the webpage feature vector representation using vector space model.After that,use the deep learning algorithm with the training webpages to train the model,finally uses the model to test other webpages and measure the classifier performance.In order to compare the performance of the deep learning classifier and other webpage classifier,I add a contrast cosine distance classification experiments.Experimental results show that the classification results using deep learning algorithm is better than the traditional cosine distance classification results.Also I studied the parameters of stacked autoencoder and the influence through the network training process.With the experiment,we can find the differences of webpage classification while the parameters changes.
Keywords/Search Tags:webpage classification, deep learning, stacked autoencoder, cosine distance classification
PDF Full Text Request
Related items