WEB Mining System

Posted on:2008-01-09

Degree:Master

Type:Thesis

Country:China

Candidate:M J Guan

Full Text:PDF

GTID:2178360215491308

Subject:Management Science and Engineering

Abstract/Summary:

PDF Full Text Request

The rapid development of the Internet led to the rapid growth of online information. We can not ignore the "information explosion" problem any more, which has resulted in enormous problems especially for inaccessible to information knowledge. Currently, 300 million WEB pages have been developed into a huge distributed information space, where abundant knowledge resources are contained. WEB information collection, WEB page purification, text clustering and Chinese word segmentation are studied in this paper shown as follows.(1) Based on the theory for acquisition of website information, current useful algorithms in this field are studied and compared. (2) In order to handle the network information efficiently, it is of necessity to purify WEB pages. Elementary principles of WEB page purification are explained in this paper, and various purification technologies are analyzed. (3) A band new WEB page purification algorithm is brought up based on dom tree, which is realized through comparing dom tree of pages in the same website. The noises of pages in the same website are relatively similar. (4) Popular domestic segmentation algorithms are compared, including segmentation method on basis of matching the thesaurus dictionary, segmentation method grounded on statistical frequency of word, and segmentation method based on knowledge of the word. (5) How WEB document Eigenvector established by WEB vector space model is described in detail. (6) Two typical clustering algorithms, k average algorithm and som algorithm are implemented. (7) A novel WEB clustering algorithm named projection WEB clustering algorithm is put forward finally.

Keywords/Search Tags:

WEB Text Mining, Page collected, Page purify, Chinese word segmentation, WEB clustering

PDF Full Text Request

Related items

1	Research On Mining Structure Of WEB Page For Information Extraction
2	Study On Web Data Processing Technology
3	The Optimization And Implement Of Enterprise Search Engine
4	Research And Implementation Of Chinese Web-page Classification Based On Web Data-mining
5	Integrating automatic Web page clustering into Web log association mining
6	The Research Of Automatic Chinese Web Page Categorization Based On Support Vector Machine
7	Data Mining Research In Web Information Retrieval And Classification
8	Web Page-oriented Handheld Devices Automatically Cutting Technology Research
9	Chinese Web Page Classification Based On Web Page Features
10	Space Tile-based Chinese Page Segmentation System