Based On Web Content Mining, Web Page Classification And Filtering Research And Applications

Posted on:2004-06-07

Degree:Master

Type:Thesis

Country:China

Candidate:X H Peng

Full Text:PDF

GTID:2208360182968579

Subject:Computer application technology

Abstract/Summary:

PDF Full Text Request

Currently, WWW is tremendous wide global informational service center , Which involves in news, finance and economics, ad, commerce, culture ,education and other information service. Many users feel theirs ability not equal to theirs ambition when they face complex huge WWW. How to help users find their's being interested in resources has been a cry for solved task.The author has designed and developed CSUIHWD system basing on Central South University campus information harbor's constructing aim. By using CSUIHWD to gather web pages on web site which users are interested in, after filter theses web pages , class them automatically based on the defined topics, then distribute these classed web pages on CSU(Central South University) web portal. By CSUIHWD , supplying csu web portal with additional resources, greatly utilizing resource in internet, and laying a stabile foundation for further constructing Chinese intelligent search engine.This paper firstly introduces some data mining and web mining's base concepts , ways and techniques, expounds what is data mining and web mining, why needs mining, and mining's advantage. At the same time, this paper also introduces web pages classing- filtering technique and CSUIHWD system prototype.Then studying the key technique of web pages content classing mining. Gathering web pages data, segmentation and building classifier are core technique of web pages content classing mining.CsuRobot executes web page gathering, which is automatically gathering web pages data program like Robot. CsuRobot adopts multithreading technique, can execute multiple gather task at the same time; Author improved converse max machine segmentation arithmetic, designed converse segmentation dictionary. After improved arithmetic had increased segmentation speed. Using statistics way based on high frequency words, which partly solved the problem that words not enrolled in dictionary; For Naive Bayes classifier does not take into account web's semi-structure, treast all words equally without discrimination. This paper thinks much ofthese words that have additional contribute and add theirs weigh, improved Naive Bayes classifier. Examination shows that the improvement is helpful.Finally summarizing our work and pointing out further research.

Keywords/Search Tags:

data mining, web mining, segmentation, class, robot

PDF Full Text Request

Related items

1	A Customer Segmentation Research Based On Data Mining In Telecommunications Business
2	The archaeology of Hickneytown: An examination of class identity in a late nineteenth century mining settlement in the Spruce Mountain Mining District
3	The Research Of Class Imbalance Classification Model In Data Mining
4	The Class-Mean Method And Its Extensions To Handling Incomplete Data In Data Mining
5	Study And Implementation On Techniques Of Parallel Mining Of Frequent Closed Sequences Based On Vertical Segmentation
6	The Research And Implement Of Algorithm On Web Usage Mining
7	Applications Of Data Mining For The Competitive Intelligence System In The Enterprise
8	The Design And Application Of Association Mining Matrix Algorithm Based On Equivalence Class
9	Web-based Data Mining Technology
10	Design And Implementation Of Flying Overhead Power Line Inspection Robot Video Evaluation Mining System