Font Size: a A A

Research On Text Classification Based On Ontology And Latent Semantic Indexing Algorithm

Posted on:2010-06-16Degree:MasterType:Thesis
Country:ChinaCandidate:N SunFull Text:PDF
GTID:2178360275488913Subject:Computer software and theory
Abstract/Summary:PDF Full Text Request
With the development of Internet, data and information has increased by exponential growth level. As a key method to process and organize a large number of texts, text classification can make people easily find what knowledge they exactly need. The explosive growth of information makes us need higher and higher requirement for text classification. Traditional methods based on machine learning and statistics require lots of training samples to train classification model. If categories are changed, we need to re-collect training samples, which is time-consuming and laborious. Further more, these methods use vector space model to express texts, and this will lead to such high-dimensional feature vectors. It is difficult to realize text classification in the high-dimensional feature space, large calculation quantity and low efficiency can not satisfy users'needs.This paper proposed a general framework based on ontology for text classification, and conducted an in-depth research on both dimensional reduction and classification process. We combined latent semantic indexing algorithm with ontology scheme on the general framework to realize a prototype system. The details were given as follow: 1. With the assistance of experts in the field, we used ontology development tool protege3.3 to build tea ontology manually. And the tea ontology can be as background knowledge to provide semantic information to realize text classification. 2. We used latent semantic indexing algorithm to reduce high-dimensional and sparse feature space, removed the characteristics items which had little contributions to text classification in order to reduce the dimensions of vectors. 3. Based on the basis of previous work, we used domain ontology knowledge to build classifier and realized semantic-based text classification. 4. Compare with the traditional naive bayes classifier, some efficient implementations for our algorithms confirmed the method was correct and feasible. And a series of comparative experimental results showed this method could achieve better classification accuracy and improve the performance of text classification.As a means of knowledge organization and representation, ontology has a lot of advantages and potential functions in theory. The introduction of ontology to text mining application can provide a new idea for people to achieve automatic text classification. Ontology-based text classification doesn't need training samples, and just obtains semantic information from ontology through the combination with the key technology in text classification, and realizes automatic text classification. This research provides an important foundation for semantic-based data mining, and will have great practical value and a wide application prospects.
Keywords/Search Tags:Text Categorization, Ontology, Feature Reduction, Latent Semantic Indexing, Vector Space Model
PDF Full Text Request
Related items