Research And Implementation Of Automatic Classification System And Key Technologies On Chinese Web Page

Posted on:2014-07-04

Degree:Master

Type:Thesis

Country:China

Candidate:J Zhou

Full Text:PDF

GTID:2308330479479220

Subject:Software engineering

Abstract/Summary:

With the development of information technology, the data on the Internet growth dramatically. One of effective means to organize and manage these massive amounts of data is web page classification which based on its text. Due to the variaty of page content, the traditional classification method of data mining doesnâ€™t do well. Therefor, how to establish a more practical method to slove this problem is the main direction.In this paper,we do some relevant reasearch on this type of problem. We point out the deficiencies that exits in these classification methods which has been used frequently and gives out the solutions. The contributions and relevant work in the paper are described as follows.Firstly, we did some research on web page classification theories,including web classification process, web representing model, Chinese segmentation and feature extraction method.Secondly, we proposed a model based on Labeled_LDA to solve the problem that some of page content contains less words.Thirdly,we proposed a pre-classification algorithm against the phenomenon that some news page canâ€™t be classified precisely for its pellmell content.Fourthly,a new architecture was designed for classification. In this architecture, we put all the idea metioned above together. And the experiment showed that based on the architecture the accuracy was improved 0.5%-1%.In addition, we analyze the shortage exited yet and put forward the direction of further improvement.

Keywords/Search Tags:

WebPage Classification, Pre-Classification, Feature Vector Expansion, Induction Model, Classification Architecture

Related items

1	Research On Classification Algorithm For Chinese Webpage
2	Research And Implementation Of Web Page Classification Based On CNN And SVM
3	Research Of Webpage Classification Model Based On URL And Content
4	A Research On Large Scale Automatic Chinese Webpages Classification
5	Malicious Web Page Detection System Based On Classification Algorithm
6	Research And Application Of Distributed Webpage Automatic Classification Algorithm Based On Bayes
7	On The Design And Implementation Of Automatic Webpage Classification Algorithm
8	Semi-supervised Webpage Classification
9	A Study Of Subject Web Classification Algorithm Based On Machine Learning
10	Research On Webpage Classification Algorithm Based On Deep Learning