Research On Text Classification Based On Ontology And Latent Semantic Indexing Algorithm

Posted on:2010-06-16

Degree:Master

Type:Thesis

Country:China

Candidate:N Sun

Full Text:PDF

GTID:2178360275488913

Subject:Computer software and theory

Abstract/Summary:

With the development of Internet, data and information has increased by exponential growth level. As a key method to process and organize a large number of texts, text classification can make people easily find what knowledge they exactly need. The explosive growth of information makes us need higher and higher requirement for text classification. Traditional methods based on machine learning and statistics require lots of training samples to train classification model. If categories are changed, we need to re-collect training samples, which is time-consuming and laborious. Further more, these methods use vector space model to express texts, and this will lead to such high-dimensional feature vectors. It is difficult to realize text classification in the high-dimensional feature space, large calculation quantity and low efficiency can not satisfy users'needs.This paper proposed a general framework based on ontology for text classification, and conducted an in-depth research on both dimensional reduction and classification process. We combined latent semantic indexing algorithm with ontology scheme on the general framework to realize a prototype system. The details were given as follow: 1. With the assistance of experts in the field, we used ontology development tool protege3.3 to build tea ontology manually. And the tea ontology can be as background knowledge to provide semantic information to realize text classification. 2. We used latent semantic indexing algorithm to reduce high-dimensional and sparse feature space, removed the characteristics items which had little contributions to text classification in order to reduce the dimensions of vectors. 3. Based on the basis of previous work, we used domain ontology knowledge to build classifier and realized semantic-based text classification. 4. Compare with the traditional naive bayes classifier, some efficient implementations for our algorithms confirmed the method was correct and feasible. And a series of comparative experimental results showed this method could achieve better classification accuracy and improve the performance of text classification.As a means of knowledge organization and representation, ontology has a lot of advantages and potential functions in theory. The introduction of ontology to text mining application can provide a new idea for people to achieve automatic text classification. Ontology-based text classification doesn't need training samples, and just obtains semantic information from ontology through the combination with the key technology in text classification, and realizes automatic text classification. This research provides an important foundation for semantic-based data mining, and will have great practical value and a wide application prospects.

Keywords/Search Tags:

Text Categorization, Ontology, Feature Reduction, Latent Semantic Indexing, Vector Space Model

Related items

1	Research On Web Text Categorization Based On Latent Semantic Analysis
2	Research On Support Vector Machines Classification Algorithm In Text Categorization
3	Research On Text Classification Filtering Technology Based On Latent Semantic Indexing And Support Vector Machine
4	Research And Improvement Of Latent Semantic Indexing Classification Model
5	The Implementation And Research Of The Probabilistic Latent Semantic Analysis Model In The Search Engine's Business Text Classification System
6	A Latent Semantic Indexing Differences Model And Its Application
7	Research And Application Of Automatic Classification Method For Work Tickets
8	The Research Of Optimization Technology In Latent Semantic Indexing Based On Pseudo Text
9	Web Text Mining Based On Latent Semantic Indexing
10	Research On Feature Selection Method Based On Text Category Relevance Degree And Latent Semantic Analysis