Font Size: a A A

The Research And Implement Of Automatic Text Classification System Which Is Based On Vector Space Model

Posted on:2008-11-09Degree:MasterType:Thesis
Country:ChinaCandidate:Q DingFull Text:PDF
GTID:2178360212979925Subject:Software engineering
Abstract/Summary:PDF Full Text Request
With the blooming of Internet information, the information-processing is becoming a necessary tool for people to have access to useful information. Text classification system is one of the most important research areas which classify texts to classes according to the content of the texts under given classes system. Since 1990s, Internet has been in such a dramatic increase that it contains huge amount of raw information including text, sound, and image. How to achieve the most virtual information in the huge and disordered text information is one of the objects of information-processing. Recently, Text Automatic Classification, which has been mixed with search engine, information pushing, sending, and filtering, has improved information service effectively. How to acquire the useful information quickly and effectively from information-sea has become a very important problem. For this purpose, the text automatic classification has been put forward and studied in application.This paper discusses the key technologies and the vector space model. The key technologies of document category, including participle, selecting the unit of character, the selection of characters, computing the weight of character and classification methods.Considering words as character, they are always nouns, adjectives, verbs. So a method that only selecting the three kinds of words was adopted, substituting the traditional stoplist pre-process of decrease the dimensions.A new weight adjustment method was proposed through analysis of TFIDF. Because the TFIDF method could not deal with words weight properly, the new method introduces feature evaluation function in the feature weight computation and adjusts the features contribution. The accuracy of categorization was improved using the new method.
Keywords/Search Tags:Vector Space Model, Feature extraction, Classification, Training
PDF Full Text Request
Related items