Research On Some Field Text Information Processing Based On Latent Semantic Analysis

Posted on:2011-12-04

Degree:Master

Type:Thesis

Country:China

Candidate:C C Zheng

Full Text:PDF

GTID:2178360302998271

Subject:Management Science and Engineering

Abstract/Summary:

PDF Full Text Request

Traditional text information processing based on original high-dimensional feature representation, terms independence assumption and literal words match often overlooks the hidden semantic structure of text information. So it can't form correct semantic processing units of documents, authors and the study institution that affects text information processing efficiency, accuracy and recall. Deep-seated text information processing includes text retrieval, Text clustering, text classification, text similarity measures and correlation mining. Therefore, to explore an effective semantic analysis and representation has great significance to text information processing and mining.To solve traditional text information processing problems, this paper introduces latent semantic analysis (LSA) model, tring by singular value decomposition (SVD), semi-discrete decomposition (SDD), non-negative matrix factorization (NMF) and other dimension reduction methods to achieve the semantic information.On the basis of summary situation and issues in domestic and international study in text information processing and latent semantic analysis,the paper elaborates on the basic ideas and principles of Latent Semantic Analysis model, focusing on mathematical principle and realization process of SVD, SDD and other semantic analysis methods. At the same time, these methods are compared with traditional semantic component extraction methods, such as PCA and explained the feasibility of their application in text information processing to make up the deficiencies of correlation method in theoretical interpretation. There is little domestic research, this paper going in for theoretical and experimental studies of SDD is a meaningful attempt.Then, the paper explores text information processing in some field typical application of the method and mechanism based on latent semantic space. By contrast test, selecting the appropriate evaluation model is for studying text clustering difference in efficiency, precision and recall rate between traditional way and the way basing on LSA. On the basis of the experiment, the paper draws some interesting conclusions and certain reference value information in the method of choice for research and semantic dimensions setting.

Keywords/Search Tags:

Text Information Processing, Latent Semantic Analysis, Singular Value Decomposition, Semi-Discrete Decomposition, Text clustering

PDF Full Text Request

Related items

1	Research On Text Clustering Algorithm Based On Latent Semantic Indexing
2	Based On Latent Semantic Indexing, Text Classification And Research In Science And Technology Information Retrieval
3	Text Classification Based On Latent Semantic Indexing
4	Chinese Text Clustering Based On Latent Semantic And Its Applications
5	Research On Text Clustering Based On Latent Semantic Analysis And Self-organizing Maps
6	Research On Web Text Categorization Based On Latent Semantic Analysis
7	Research On Text Summarization Based On Latent Semantic Analysis
8	Research Of Network Hotspot Content Classification Based On Improved Singular Value Decomposition And Cosine Theorem
9	Research On Rough Classification Of Academic Papers Based On Topic And Semantic Fingerprint Fusion
10	Study Of Multi-WebPages Automatic Abstracting Based On Latent Semantic Analysis