Font Size: a A A

Vector-based Data Clustering Methods Research

Posted on:2014-01-18Degree:MasterType:Thesis
Country:ChinaCandidate:Y LiFull Text:PDF
GTID:2248330398469585Subject:Computer software and theory
Abstract/Summary:PDF Full Text Request
Since human beings go into the information age,the computer is facing an ever-growing amount of data, especially with the rapid development of internet,the number of light-WEB text is already voluminous and complicated.Clustering for pattern analysis,grouping,decision-making and machine learning has a very important role in the case,including data mining,document retrieval,image segmentation and pattern classification and other fields. To users of this information dose not lost in the sea of information,accurate and quickly find the information required by the user,or the efficient management of mass text messages,you need the text data processing(classification,clustering,feature extraction,etc.),and research and efficient text data clustering algorithm is one of the core data processing. Some of the features belongs to Text data mining itself make the text data mining has its own problems,such as how to deal with the huge amount of data,how to deal with the complex semantic text data,how to improve the accuracy of text data mining,how to present the result for visual and so on.First of all, this article analyzed the research status in clustering at home and abroad,and combining the basic theory of technology in text clustering research,we analyzed the Advantages and disadvantages of various kinds of technique,also wei explored the generation of text vectors.The paper gave a review of researches in data clustering,including:the component of clustering,choice of clustering algorithms,type of clustering, expressing of data, similarity measure of data,expressing of clustering result, estimate of clustering and so on.After that,the paper gave a analysis of classical algorithm in clustering.With the concept and basic technology involving text clustering,the paper gave a presentation in detail. As one of the core technology of data clustering,the production of vector is very important,and an algorithm of vector production is presented in this paper, with the steps introduced in detail.The experimental which use two different algorithms is also presented,and the result indicates that the algorithm has a high precision in data clustering.In addition,the algorithm can play an important role in text retrieval,so the algorithm has relativly high value in use.
Keywords/Search Tags:data clustering, vector, Web text retrieval
PDF Full Text Request
Related items