Font Size: a A A

Improved Web Pages Classification Algorithm Based On MIMLRBF Neural Network

Posted on:2017-02-14Degree:MasterType:Thesis
Country:ChinaCandidate:J X LiFull Text:PDF
GTID:2348330566957466Subject:Computer technology
Abstract/Summary:PDF Full Text Request
With the advent of the age of Big Data,digital city,digital planet is moved towards wisdom city,wisdom planet.Wisdom requires a reasonable allocation of resources.In this process,to obtain useful information in the massive web pages is an important and difficult link to intelligence.The technology of web page automatic classification can effectively solve this problem,among them,the MIMLRBF neural network as a new learning framework,with its excellent classification results attracts the attention of many researchers.This paper introduces the technology of web page classification,describes the research process of MIML learning framework and RBF neural network,depicts the traditional RBF training algorithm,MIMLRBF training algorithm and a variety of different ways to solve the problems of the MIML algorithms in detail.Through a careful analysis of the training process of traditional MIMLRBF algorithm,it is found that traditional MIMLRBF algorithm in clustering stage deal with unbalanced data sets will produce unbalanced number of each kind of the neurons in the hidden layer,and lead to the error function ignores the fewer number of sample classification and classification effect is not ideal.Aiming at this problem,two solutions are put forward.One is to deal with all kinds of sample set without changing the distribution structure and the density,it makes the number of samples in each category tending to be balanced.For category samples with a large number of samples,use under sampling algorithm,and the distance sorting is used to eliminate the distance.For category samples with small number of samples,using over sampling algorithm to create a similar sample.Another solution is to improve the clustering algorithm,setting a threshold value based onto the proportion of each sample number in each category.this threshold varies with the number of class samples.In clustering stage,determine the number of cluster centers based on the threshold,making the number of hidden layer neurons in each category to achieve a relatively balanced.Design simulation experiments,compare two kinds of improved algorithm with three related MIML algorithms.According to the improved algorithm design and implement the web page classification system,the results show that the two improved algorithms have better classification performance.
Keywords/Search Tags:Web pages classification, Multi-instance Multi-label, Neural Network, Imbalanced samples, Clustering Algorithm
PDF Full Text Request
Related items