Font Size: a A A

The Research Of Text Classification Based On Semantic Web

Posted on:2012-08-27Degree:MasterType:Thesis
Country:ChinaCandidate:X X SongFull Text:PDF
GTID:2178330332992694Subject:Computer software and theory
Abstract/Summary:PDF Full Text Request
With the development of Internet, the number of documents on the networds increases exponentially. One important research focus on how to deal with these great capacity of documents. Text classification can handle a large number of text, resolve the current situation about the disorder information by a large extent, and accurate positioning needed user-friendly. The purpose of this paper is to develop a Semantic Web-based text classification system as information retrieval, information filtering, search engines, text database, digital libraries and other areas of technology infrastructure.The paper researches on text classification in the application of Semantic Web. In the process of text feature extraction, nouns and verbs are often more representative of the types of information documents, so taking part of speech feature extraction method can effectively reduce the dimension of feature vectors, adding up the views of extracted feature words to get the weight of feature words. That the text represented in the form of triples by the rule of RDF described on the page of information and their relationship, the work is the basis for subsequent processing and great significance. The paper analyzes the shortcomings of traditional KNN algorithm is strong dependence to the number of sample storage, to solve this problem by adding the weight of KNN algorithm, its significance lies in the role of the main part more than minor.The text classification system of semantic web-based is better use of the semantic relationship to accurate classification, the paper presents a semantic web-based text classification model. The paper selects six classes to verify the experimental results, respectively, computer classes, plant classes, animal classes, food classes, sports classes, military classes etc. Finally, Microsoft Visual C++ 6.0 as a development platform completes the development of text classification system.Analysis and experiments show that, feature extraction used in this method can reduce the dimension of feature vectors, the selected text is tested, experimental results show that the proposed algorithm can better classify, the system play a strong supporting role in large news sites.
Keywords/Search Tags:Features Extraction, KNN Algorithm, Text Classification, Semantic Web
PDF Full Text Request
Related items