Font Size: a A A

The Effective Text Keyword Extraction Technologies And Their Applications

Posted on:2015-05-14Degree:MasterType:Thesis
Country:ChinaCandidate:Y L M P E H T ReFull Text:PDF
GTID:2298330431991890Subject:Signal and Information Processing
Abstract/Summary:PDF Full Text Request
With the advent of the era of Internet, online documents began to emerge and arestill great increasing in volume. Facing such vast information resources, effectivelyextracting the key elements of information is very important. Keyword extraction isthe most effective technology for automatic text summarization, text classification,text clustering and information retrieval research.In this paper, a text library for training and testing is established at first using1000documents (of which500belong to the health category, the remaining500areabout computers, education, economy, and other non-health class documents). Then,the keyword extraction method based on TextRank is applied in keyword extractionexperiment. The highest accuracy of the document classification obtained by thismethod is75.5%, and there is no significant contribution on classification performanceas increasing the number of keywords. In order to futher improve the classificationaccuracy, this paper proposed keyword extraction method based on discriminativeTF/IDF keywords, which is obtained by the difference between classes according to thedifferent combined statistics of same words. This method provided disntinctclassification accuracy of98.5%(the number of key words are100). Although,Discriminative keyword extraction method based on TF/IDF has its advantages ofeffectiveness, but we have to consider its disadvantags such as demanding largevolume keywords and lacking of theriotical base.Therefore, this article also referes SDA (sparse discriminant analysis) methodwhich is commonly used in the field of biotechnology. Experimental results show thatSDA method gifted the document classification accuracy of98%with smaller data set(the number of key words are90). Thus, in order to further improve the accuracy on asmall data set, the keyword extraction method based on SparseSVM is also studied. Inexperiments, when the numbers of keywords are10,20,30respectively, SDA methodshown the document classification accuracy of88.5%,90.5%,91.5%accordingly;while classification accuracy were90%,92%,95.5%using SparseSVM method.These results show that SparseSVM method is desirable for document classificationwith on smaller data set.In order to verify the performance stability of these technologies, this paper givesUyghur text emotional identification results using above four methods, and the resultsare satisfactory.
Keywords/Search Tags:Uyghur text, Keyword extraction, Discriminative keywords, SparseSVM, Sparse discriminant analysis
PDF Full Text Request
Related items