Font Size: a A A

Research And Analysis Of Efficient Web Page Classification Technology Based On Deep Learning

Posted on:2020-10-12Degree:MasterType:Thesis
Country:ChinaCandidate:Q W WuFull Text:PDF
GTID:2428330572472248Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
In recent years,with the rapid development of network communication technology,the Internet has gradually integrated into all aspects of daily life,and the number of web pages has shown an exponential growth trend.Faced with massive and complex web page information,how to organize and manage this information efficiently becomes a difficult problem.Web page classification,as a fundamental step in the organization and management of Internet information,plays a vital role in many applications,such as search engines,topic crawlers,malicious web page recognition,and maintaining directory-based websites.Traditional Web mining usually uses webpage feature engineering combined with machine learning algorithms to classify webpages.However,with the complexity of webpage structure,the effective feature extraction of webpages becomes more and more difficult,which leads to the traditional machine learning method being effective in automatic webpage classification.Therefore,this paper proposes an efficient web page classification algorithm based on deep learning.Based on the text content,title and other information of web pages,a deep neural network is used to build a multi-channel input and composite feature extraction structure classification model.The model can effectively improve the accuracy of web page classification and meet the requirements of efficient and automatic classification of web pages in specific fields.The main work of this paper is as follows:1.Analyze the advantages and disadvantages of traditional machine learning methods in web page mining and introduce the characteristics and advantages of deep learning in web page classification;expound the collection and storage technology of web page data;research and analyze word vector technology;analyze attention The feasible mechanism of the force mechanism on the web page classification problem;the convolutional neural network,the core algorithm principle and scientific application of the cyclic neural network are studied.2.The framework of efficient web page classification algorithm based on deep learning is designed,including data acquisition and preprocessing.The pre-training process of word vector is designed to introduce external semantics for neural network.A reasonable neural network is designed for web page title,content and structure.The feature extraction model,while also incorporating the machine learning model into the framework,designed a correction mechanism to improve the classification effect.3.The training and tuning of the webpage classification model based on deep learning is completed.The data generator and multi-GPU parallel method are used to train the neural network model efficiently,and the result feedback mechanism is realized in the training process.With the efficient training method,the algorithm model is tuned and targeted multiple times.The results of the tuning experiments were analyzed in detail.
Keywords/Search Tags:Web page classification, Neural Networks, Word2Vec, Attention Mechanism
PDF Full Text Request
Related items