Font Size: a A A

Research And Design Of Web Classification Algorithm Based On Education Browser

Posted on:2019-02-05Degree:MasterType:Thesis
Country:ChinaCandidate:T B A HeFull Text:PDF
GTID:2428330548967113Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
With the rapid growth of Internet,the resources of network are becoming more and more abundant.The Internet has become the main channel for people to obtain various information and resources.Searching engines play an important role in the information retrieval on the Internet.However,it still cannot meet people's demands fully in terms of search efficiency and the accuracy of search results.In addition,the Internet is also flooded with unhealthy content that involves pornography,violence,gambling,or drugs.How to filter out these bad information and create a green and secure network environment poses a challenge to searching engines.Web page categorization can provide new ways to solve the above problems.When we need to search for the information we want from the mass data,the web page tag which can represent the web own characteristics is useful to improve the retrieval efficiency and accuracy.In the meantime,we can filter out something indifferent and webs which involve illegal and harmful information through the identification to the web page's tags,which can improve the accuracy of filltering.Based on the educational browser is being developed by wrier's project team,this research still explores the technology of web page classification to find an efficient categorization algorithm.The main content of the research includes:1.Investigating the status quo of researches and application at home and abroad about web page classification.And being clear about the related technical basis and research methods,including the general process of text categorization,word segmentation etc.2.Researching on the several key mechanisms in the web page classification,inc hiding how to write targeted web crawlers in order to obtain web page inform ation,how to preprocess web pages to get web page text content;how to use Chinese word segmentation to tackle web page texts,and how feature extractio n is performed on the processed text.3.Designing and implementing a webpage classification algorithm.In addition to naive Bayesian and SVM(support vector machine),the paper also introduces an emerging machine learning algorithms named Random forest algorithm into the research,and puts forward an modified technology named Semi-random forest algorithm aimed at categorization.Through the data experiments on three classification algorithms,the results show that the improved algorithm,Semi-random forest algorithm has better classification effect,and it is simpler than SVM in structure.This study not only enriches the function of the educational browser,but lays the foundation for intelligent services and applications based on educational browsers such as user behavior analysis and personalized content recommendation.
Keywords/Search Tags:Web Classification, Naive Bayes, SVM, Random Forest
PDF Full Text Request
Related items