Research And Application Of Web Text Information Clustering Algorithm

Posted on:2010-11-06

Degree:Master

Type:Thesis

Country:China

Candidate:S K Zhang

Full Text:PDF

GTID:2178360275973455

Subject:Communication and Information System

Abstract/Summary:

PDF Full Text Request

Along with the rapid development and universal popularization of Internet,information resources on Web have been expanded increasingly in which there are both useful information and some reactionary,superstitious content.Therefore,it has become an important and urgent issue that how to make an accurate analysis and prediction of the public opinion on the network by way of searching quickly and efficiently in this huge information resources.To solve this problem,data mining technology came into being,which is based on database technology and integrated the achievement of multi-discipline such as statistics, automatic learning and fuzzy logics,to study how to acquire valuable information or pattern which is usually connotative and unknown in the database.Clustering analysis is a key technology of data mining.By contrasting the similarity and differences,it can find out the inner characteristic and distribution rule of data.After systematically reviewing the development of Web information retrieval,data mining and clustering algorithm,this dissertation summarizes the existing problems of general clustering algorithm.We attempt to design a clustering algorithm which has a wonderful performance in the application of Chinese Web clustering and implement a Web text information clustering system.The main contribution and innovation of this paper are as follows:(1) Advantages and disadvantages of several common clustering algorithms are analyzed,including Partitioning Method,Hierarchical Method,Density-Based, Grid-Based and Model-Based Clustering algorithms.By evaluate the efficiency of these algorithms we analyze their defects in the application of Web information clustering.(2) Making researches on several key technology of Chinese Network Information clustering,including word partition techniques,text representation,dimension lowering, weight analysis and calculation of similarity.(3) We mainly focus on the suffix tree clustering algorithm and make improvement based on Binary Search Tree in the application of Chinese Network information clustering.(4) On the basis of experiment results,K-means,Suffix Tree Clustering and improved Suffix Tree Clustering are compared in terms of accuracy and time-complexity. (5) Moreover,we design and implement a Chinese network information clustering system based on improved Suffix Tree Clustering algorithm which is proved efficiency and feasible by experiment results.

Keywords/Search Tags:

Data Mining, Network Information Retrieval, Clustering Algorithm, Suffix Tree Clustering

PDF Full Text Request

Related items

1	Text Clustering And Its Application In Web Community Search Engine
2	Sensor Network Big Data Classification Processing Based On Suffix Tree Clustering
3	Design And Implementation Of Suffix Tree Based Uyghur Web Page Clustering Algorithm
4	Research On Vietnamese News Topic Recognition Method Based On Suffix Tree Clustering Algorithm
5	Research On The Methods Of Web Text Mining For Information Retrieval
6	Research On Web Document Clustering Approaches Based On Phrase Features
7	Chinese Text Clustering Algorithm Based On Suffix Tree Research
8	Design And Implementation Of Meta Search Engine Based On Suffix Tree Clustering
9	The Application Of Suffix Tree Clustering Algorithm In Meta Search Engine
10	The Research On Dynamic And Abstract Clustering Method Of High Dimensional Sparse Data