Font Size: a A A

Research On Some Field Text Information Processing Based On Latent Semantic Analysis

Posted on:2011-12-04Degree:MasterType:Thesis
Country:ChinaCandidate:C C ZhengFull Text:PDF
GTID:2178360302998271Subject:Management Science and Engineering
Abstract/Summary:PDF Full Text Request
Traditional text information processing based on original high-dimensional feature representation, terms independence assumption and literal words match often overlooks the hidden semantic structure of text information. So it can't form correct semantic processing units of documents, authors and the study institution that affects text information processing efficiency, accuracy and recall. Deep-seated text information processing includes text retrieval, Text clustering, text classification, text similarity measures and correlation mining. Therefore, to explore an effective semantic analysis and representation has great significance to text information processing and mining.To solve traditional text information processing problems, this paper introduces latent semantic analysis (LSA) model, tring by singular value decomposition (SVD), semi-discrete decomposition (SDD), non-negative matrix factorization (NMF) and other dimension reduction methods to achieve the semantic information.On the basis of summary situation and issues in domestic and international study in text information processing and latent semantic analysis,the paper elaborates on the basic ideas and principles of Latent Semantic Analysis model, focusing on mathematical principle and realization process of SVD, SDD and other semantic analysis methods. At the same time, these methods are compared with traditional semantic component extraction methods, such as PCA and explained the feasibility of their application in text information processing to make up the deficiencies of correlation method in theoretical interpretation. There is little domestic research, this paper going in for theoretical and experimental studies of SDD is a meaningful attempt.Then, the paper explores text information processing in some field typical application of the method and mechanism based on latent semantic space. By contrast test, selecting the appropriate evaluation model is for studying text clustering difference in efficiency, precision and recall rate between traditional way and the way basing on LSA. On the basis of the experiment, the paper draws some interesting conclusions and certain reference value information in the method of choice for research and semantic dimensions setting.
Keywords/Search Tags:Text Information Processing, Latent Semantic Analysis, Singular Value Decomposition, Semi-Discrete Decomposition, Text clustering
PDF Full Text Request
Related items