Font Size: a A A

Research On Webpage Classification Based On Sparse Auto-Encoder And Layer-wise Back Propagation

Posted on:2016-12-03Degree:MasterType:Thesis
Country:ChinaCandidate:G J JiangFull Text:PDF
GTID:2298330470457730Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
With the development of the Internet, the number of webpage increases quickly, also indicates the era of big data will come soon. Messy, lots of pages of text, is not conducive to people to find and filter information. In order to facilitate web search and text mining,the classification of webpage is very important.Generally, the webpage classifications use SVM, BP neural networks,Naive Bayes and other traditional algorithms,and select features with information gain, mutual infor-mation and maximum entropy models. The information gain achieves the best perfor-mance. However, it is difficult to determine the threshold of the information gain.Additionly, BP neural network applications with high-level, usually have some problems like easily fall into local minimum point,poorly generalize the complex functions, and inefficiently train. To solve these problems, this thesis proposes a webpage classifier based on SAE-LBP which combines with Sparse Auto-Encoder(SAE) and LBP(Layer-wise BP) neural networks.The main works of this thesis are as follows:1.According to the semi-structured feature of the webpage, the weight of feature representation is improved,and the classification accuracy increases about1%, com-pares with the traditional BP neural network.For a specific data set, this thesis classifies and sets the weights of HTML tags through the statistics of the number of the data set of each label,and combined with the analysis of the role of each label.2.For sparsity of webpage,the SAE is used to select features of high-level and ab-stract as a deep learning method, and the classification accuracy improves about4%, compares with the traditional BP neural network. The SAE which adds the sparse rep-resentation into the BP neural network model, makes it better to describe the sparsity characteristics of the webpage. The SAE which adds a penalty term into the BP neural network model, avoids the over-fitting problem.3.To solve the iterative shocks and slow adjustment problems,the traditional learn-ing rate automatic adjustment algorithm is improved. And it can improve time perfor-mance40%~60%through the experiments, compares with the traditional BP neural network.The algorithm avoids to adjust learning rate too many times by setting a lower limit. In addition, when the frequency of the rise error is too large, it sets a lower learn-ing rate to avoid the iterative shocks. The algorithm also prevents the next iteration to encounter error increases by setting an upper limit.4.For the random initial value which meets the convergence slowly of BP neural network training,the LBP neural network is proposed to train BP neural network layer-wisely. And it can improve time performance40%~60%through the experiments, compares with the traditional BP neural network.This paper uses LBP neural network to train SAE-LBP classifier. LBP neural network training algorithm begins with3layers, and superimposed training BP neural network until the target layers.In order to make the parameter value of low level layer to close to the optimal value, we pre-train the BP neural network.The pre-training iterations which only need a limited number, do not need arrival at the convergence point.Using LBP neural network to pre train avoid unnecessary iterations efficiently.Webpage classifier based on SAE-LBP which uses SAE to select features deeply, imporves the classification accuracy efficiently. The SAE-LBP webpage classifier uses LBP neural network which based on learning rate automatic adjustment algorithm,and improves the time performance efficiently.Experimental results show that the webpage classifier bases on SAE-LBP improves the classification accuracy about5.19%and im-proves time performance83.86%, compares with the traditional BP neural network.
Keywords/Search Tags:SAE, LBP, Webpage Classification, Deep Learning, Neural Networks
PDF Full Text Request
Related items