Font Size: a A A

Study Of Text Classification Model Based On Key Vector

Posted on:2009-07-13Degree:MasterType:Thesis
Country:ChinaCandidate:B ZhaoFull Text:PDF
GTID:2178360245486554Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
With the rapid development of computer and network technology, Internet has become the primary means of storage and access to information, the amount of text stored in the online growth exponentially. This provides users with massive information, but also it is hard for users to gain helpful message from it. At present how can retrieve the needed information quickly and accuratily has become an important research topic. Text classification technology can organize text messages effectively, help people positioning text messages accuratily and efficiently. Since been raised in 1960s automatic text classification technology has been a great deal of development and has been widely applied such as search engines, digital libraries and information retrieval.Vector space model is a generic model used in large-scale text processing, the current mainstream classification algorithm all based on this model like K-Nearest Neighbors (KNN) and Surport Vector Machine (SVM). Although these algorithms has a broad and in-depth study and application but there are still many unsatisfy place. Based on theoretical study and research literature, this topic researched the features of vector space model, Analysis the shotcomings of existing algorithms. Targeted at the feature of treat training document too simple of mainstream algorithms, this algorithm raised a text classification algorithm based on the vector space model, introducted the concept of key vector, by the analysis of the training documents, identify the key vectors in each category and given a weight to them to provide more information for the category, finally, classification the testing documents by this key vectors.Then this algorithm has been tested and compared with traditional algorithms. The results show that this classification algorithm can improve classification accuracy and speed compared with the traditional algorithm.
Keywords/Search Tags:data mining, text classification, vector space model, key vector
PDF Full Text Request
Related items