Font Size: a A A

Application Of The SVD In Text Classification

Posted on:2013-04-14Degree:MasterType:Thesis
Country:ChinaCandidate:L L HeFull Text:PDF
GTID:2248330395975584Subject:Software engineering
Abstract/Summary:PDF Full Text Request
As information technology advances and the increasing popularity of the Internet, ourinformation is in a rapid growth. How to find and obtain the required information from thevast sea of information space quickly has become one of the most fundamental problems inthe new information age. Quick text categorization of massive text is an important researchdirection of the data mining. And the feature dimension reduction technique is the key ofquick text categorization.The main research study of this thesis: The application research of SVD (Singular ValueDecomposition) in text categorization. Including SVD’s algorithm realization andoptimization, K-value of SVD percentage selection methods put forward and verification,KNN and SVD+KNN text categorization algorithm comparison of effects. The initialcharacteristics-text matrixes of text categorization have a high dimension, which bring a lot ofinconvenience to the calculation. And through the simple feature selection methods, althoughreduces the dimensions, but cannot solve the problem of synonyms and polysemy. Andfeature extraction (such as latent semantic analysis) based on the semantic features of thespace conversion, form a new semantic space, eliminate the "noise". SVD algorithm is a veryimportant tool of feature extraction. In the latent semantic analysis, the K-value selectionstrategy decides the effect of the SVD algorithm. In this paper, I choose the K-valuepercentage selection strategy can well improve the precision and speed of text categorization.The fitting concept which I put forward can be directly compare two matrix of similar degree.In this paper, I finished a large number of experiments, verifying the optimization effectof OpenMP technology in SVD algorithm; Verifying the K value percentage strategies oncharacteristics-text matrix good dimension reduction effect and initial matrix higher fitting;Verifying the SVD algorithm can well solve the problem of synonyms and polysemy, and atthe same time, improve the text classification accuracy and speed.
Keywords/Search Tags:Text Classification, SVD, Singular Value Decomposition, Reducingdimensions
PDF Full Text Request
Related items