Font Size: a A A

Study And Implementation On Latent Semantic Space Analysis And Web Document Clustering Based On LDA

Posted on:2011-05-08Degree:MasterType:Thesis
Country:ChinaCandidate:Z L LiuFull Text:PDF
GTID:2248330395958010Subject:Computer software and theory
Abstract/Summary:PDF Full Text Request
With the development of Internet and search engine technology, Web document clustering is an important technologies for improving web search and personalized service, It is an important factor for affecting Web document clustering that representing web document.Currently, the issue that the Web document clustering document representation model and design and implementation of clustering algorithm leads to the poor quality to clustering, and that can not meet customer needs.With the research of representation model for Web document and Web document clustering process, an Approach of Latent Semantic Space Partition and Web Document Clustering is proposed by this paper. Firstly, this paper applies the LDA model to analyze the latent semantics of documents and partitions semantic space into low frequency, middle frequency and high frequency semantic space. The semantics in the low frequency semantic space are used to detect outlier Web document. The semantics in the middle and high frequency semantic space are devoted to document clustering as the features of documents. The performance of clustering results are improved by the mutual-action mechanism between document clusters and semantics.Compared with related work, this paper not only applies LDA model to represent documents, but also divides the semantic distribution spaces deeply and applies the results of analysis to web document clustering. Experiments show that the clustering algorithm based on the LDA document features and semantic mutual-action proposed by this paper deserve better effects in document clustering. Experiments show that Latent Semantic Space Partition and Web Document Clustering algorithm for clustering has a better performation on clustering accuracy than the traditional term and semantic extracted by PLSA for the feature of the clustering algorithm and the outlier can be detected and located accurately before clustering running, and the clustering algorithm based on the LDA document features and semantic mutual-action proposed by this paper deserve better effects in document clustering.
Keywords/Search Tags:LDA, latent semantic, semantic distribution, document clustering
PDF Full Text Request
Related items