Research Of Latent Semantic Analysis Based On Paragraph

Posted on:2015-05-28

Degree:Master

Type:Thesis

Country:China

Candidate:C Bi

Full Text:PDF

GTID:2298330467468633

Subject:Computer software and theory

Abstract/Summary:

PDF Full Text Request

As a technique of data mining based on statistic, Latent SemanticAnalysis is widelyapplied in many fields, such as Information Retrieval and Text Categorization. By optimizingthe Vector Space Model, this technology has good effect on extracting potential semanticstructure information among features, which is based on the context and co-occurrence. Thetechnology reduces dimensions of the original vector space model, filters noise of texts, andhighlights the potential semantic relationships among features by mapping features anddocuments into a potential semantic space with lower dimensions. It breaks the independenceof features assumption, and has better description of texts.Current researches on Latent SemanticAnalysis mainly concentrate on analysis andoptimization work of the relevant mathematical model and feature weight, but researches onoptimization of latent semantic space are relatively less. Meanwhile, when the technique isapplied in text classification, most researches focus on filter features from classifieddocuments for subsequent work. As for the way how features affect the building process oflatent semantic space, thereby affecting classification performance of the system, there areless researches. To solve these problems, this paper focuses on the way in which featuresco-occurrence affects latent semantic space when latent semantic analysis uses text paragraphinstead of origin documents.By studying the principle of features co-occurrence, analyzing featuresâ€™distributionthrough context and global document, and studying data from lots of experiments, this paperintroduces concepts and construction methods of sub-documents and fake-documents. Bycombining the two optimization methods, this paper finds a way which effectively optimizesthe latent semantic analysis techniques, strengthens reasonable co-occurrence of similarfeatures and weakens un-reasonable co-occurrence of un-similar features.On the basis of research on document combination optimization of latent semanticanalysis technique, optimized technique based on document paragraph combination is applied on LSApatent classification system. Experimental results show that the final classificationprecision is about3.2percent higher than that of the best baseline model throughmulti-methods combination.

Keywords/Search Tags:

Latent SemanticAnalysis, Document Parts, Sub Title, Feature Extraction, TextCategorization

PDF Full Text Request

Related items

1	Research Of Title Party News Identification Technology Based On Latent Semantic Analysis
2	The Study Of Feature Extraction Based On Complexion Title
3	Scene Classification Method Based On Statistical Feature Of Regional Research
4	Audio Scene Recognition Based On Probabilistic Latent Semantic Analysis
5	Maximizing The Impact Of The Problem-oriented Theme
6	Image Aesthetic Evaluation Based On Latent Semantic Feature
7	Research And Implementation Of Document Summarization Based On Combined Multi-Feature
8	Research Of Image Feature Extraction Algorithm Based On Latent Low-rank Representation
9	Research And Implementation Of Recommendation Algorithm Based On The PLSA Model
10	Hybrid Prediction Of Time Series Based On Latent Feature Extraction