Font Size: a A A

Web Pages Classification Based On MIMLRBF Neural Network

Posted on:2015-01-04Degree:MasterType:Thesis
Country:ChinaCandidate:M M WangFull Text:PDF
GTID:2308330503475082Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
With the rapid development and popularization of Internet, a great number of information is released through the Internet. In order to help people get useful information, the web pages classification emerges. It uses a kind of machine learning method to assign labels to web pages automatically. Among many web pages classification algorithms, RBF neural network becomes a research focus in machine learning because of its excellent learning and classification ability.Firstly, this paper introduces the development of RBF neural network, working theory and related technologies including common training methods. It also researches the multi-instance multi-label learning framework and related algorithms. We focus on the MIMLRBF neural network algorithm that uses RBF neural network to solve the multi-instance multi-label learning problems. In case of imbalanced sample set, the number of neurons in hidden layer will be imbalanced. Thus the classification results become poor because of ignoring the less class sample while training. So we proposed an improved method. Firstly, count the number of the class with fewer samples, selecting a certain number of initial clustering centers for every class according to the number. Then for the remaining samples in each class, judge it whether can be a new clustering center or not according to the size of class. Lastly, optimize the centers using the related algorithm. A clustering center corresponds to a hidden neuron. Thus determine the number of neurons according to the number of sample dynamically and become balanced, reducing the imbalanced problem on the network.The classical MIMLRBF neural network algorithm selects a same width value for every radial basis function t, without taking into account the distribution density of samples near each center. Aiming at this problem, this paper proposed an improved algorithm considering the sample distribution within the cluster. Firstly, find the center for every cluster, computing the average distance and variance between the centers. Then compute the variance of every cluster which can reflect the distribution of samples. Lastly, determine a proper width value according to the cluster distribution and the whole samples distribution. The whole network becomes smoother.Finally, the improved training algorithms are compared with the three classical algorithms on two data sets. And they are applied to the web pages classification system. Experimental data show that the algorithms have higher efficiency and accuracy.
Keywords/Search Tags:Web pages classification, MIMLRBF Neural Network, Imbalanced samples, Radial basis function width, Sample distribution
PDF Full Text Request
Related items