Improved K-Nearest Neighbor Arithmetics On The Web Page Classification

Posted on:2011-06-08

Degree:Master

Type:Thesis

Country:China

Candidate:F Bai

Full Text:PDF

GTID:2178360305472737

Subject:Computer software and theory

Abstract/Summary:

PDF Full Text Request

With the rapid development of modern communications technology and the popularization of Internet, World Wide Web has become the most significant, the most widely distributed information service center, since 2001. Currently more than 3 billion pages online, the speed of every day millions of new pages has been increase. For these massive explosive growth of network information, how to get effective access to useful, interesting information is an important issue for the modern information study. To solve this problem, combined with the heterogeneous, unstructured feature of Web information and the traditional data mining techniques.We form a new challenging issue, namely, Web mining. Its purpose is to find and analyse useful information in the WWW.In this paper, the main work is to study the classification algorithm in the Web mining. The work includes the following:1. The paper briefly introduces the background of text classification study and the present conditions of today's then explains algorithm idea commonly used in text classification, including the Naive Bayes Algorithm, Support Vector Machines, Decision Tree and K Nearest Neighbor Algorithm.2. Some key technologies are briefly introduced for Web text classification needed:web content extraction, word segmentation, stop-word processing and automatically obtaining features techniques.3. It anslyses the advantages and disadvantages of the K nearest neighbor algorithm, For lazy shortcomings of K Nearest Neighbor Algorithm,we propose two modified algorithms. According to the traditional formula for calculating the similarity of the defects for web page classification, it puts forward an improved similarity measure.Through the experiment, We implement an improved K-nearest Neighbor Algorithm and the traditional K-nearest Neighbor Algorithm and find that its has improved on their own focus. The improved algorithm can be effectively applied to Web data mining, information retrieval and other application areas.

Keywords/Search Tags:

KNN algorithm, similarity, web page classification

PDF Full Text Request

Related items

1	Improved K-Nearest Neighbor Arithmetics On The Web Page Classification
2	Research On Webpage Recognition Technology Based On Vision And Semantics
3	Research And Implement Of Topic Oriented Web Page Classification Technique
4	Research And Implementation On A Web Page Classification System
5	Web Structure Mining Based On The Maximum Flow And Page Similarity
6	Page Ranking Algorithm Based On Link Similarity Study
7	Research On Improved KNN Chinese Web Page Classification Based On Weka Platform
8	Research And Implementation Of Content Oriented Web Page Classification
9	Chinese Web Page Classification Based On Web Page Features
10	Research On Vision Based Algorithm In Chinese Web-Page Classification