Font Size: a A A

Classification Of Deep Learning Web Pages Based On Semantic Feature Fusion

Posted on:2021-07-24Degree:MasterType:Thesis
Country:ChinaCandidate:Z LinFull Text:PDF
GTID:2518306305981359Subject:Information Science
Abstract/Summary:PDF Full Text Request
With the development of Internet technology,various websites appear in the cyberspace environment,which provides great convenience for people to obtain information.The number of various web pages shows an exponential growth trend with the increase of the number of websites.In the face of the emergence of a large number of website resources,how to provide an efficient,accurate and reasonable web page classification screening method,so that people in need to find their own resources become an important and meaningful problem.Text categorization is a classic topic in the field of natural language processing,web page classification in the final analysis or text classification problem of web page classification is an Internet based resource management and organization of one of the most basic problem,in the search engine page tampering with malicious sites to identify such as play an irreplaceable role in the Internet based application Traditional Web data mining techniques often used Web features analysis combine the shallow level of machine learning methods for Web page classification,but with the Web page of complicated data structure,the unstructured Web data is becoming more and more difficult to extract the effective features,traditional machine learning on the effect of Web page classification is hard to achieve the development of the breakthrough.The deep learning model has gradually become the mainstream technology to process text classification.The method of constructing feature text vectors based on the analysis model of deep learning can accurately express the meaning and semantic information in the text,but it is also easy to be troubled by such problems as sparsity,resulting in poor classification effect.According to the above problem,this dissertation proposes a deep learning web page classification model based on feature semantic integration,this model uses the TextCNN extract important tag page(title,keyword,description)text semantic characteristics,using XLNet extract other label contents page text semantic feature,and the two parts of web page classification feature fusion,to solve the network features in the process of fusion collinear and vector sparsity problem,this dissertation cites the characteristics important tags semantic features and semantic integration mechanism to enhance web fusion of semantic features of other tags,into one step to enhance web text semantic characterization of text information,so as to improve the accuracy of web page classification rate of recall and F1 value indicators,as well as corresponding improving model of generalization ability.In response to the above problems,this dissertation proposes a deep learning web page classification model based on feature semantic fusion.This model uses TextCNN to extract text semantic features of important tags(title,keyword,description)in the web page,and uses xlnet to extract the content of other web page tags.Text semantic features,and merge two parts of features to classify web pages.In order to solve the problem of feature collinearity and vector sparseness in the process of network feature fusion,this article refers to the feature semantic fusion mechanism to enhance the semantic features of important tags and the semantic features of other tags on web pages.The fusion of the webpage further enhances the representation of the semantic text information of the webpage text,thereby improving the accuracy rate,recall rate and F1 value of the webpage classification,and at the same time improving the pan-Chinese ability of the model accordingly.Experimental results show that the deep learning model based on semantic feature fusion proposed in this dissertation can effectively classify web page text with high accuracy.
Keywords/Search Tags:Text Classification, Web Page Classification, Deep Learning, Semantic Feature Fusion
PDF Full Text Request
Related items