The Effective Text Keyword Extraction Technologies And Their Applications

Posted on:2015-05-14

Degree:Master

Type:Thesis

Country:China

Candidate:Y L M P E H T Re

Full Text:PDF

GTID:2298330431991890

Subject:Signal and Information Processing

Abstract/Summary:

PDF Full Text Request

With the advent of the era of Internet, online documents began to emerge and arestill great increasing in volume. Facing such vast information resources, effectivelyextracting the key elements of information is very important. Keyword extraction isthe most effective technology for automatic text summarization, text classification,text clustering and information retrieval research.In this paper, a text library for training and testing is established at first using1000documents (of which500belong to the health category, the remaining500areabout computers, education, economy, and other non-health class documents). Then,the keyword extraction method based on TextRank is applied in keyword extractionexperiment. The highest accuracy of the document classification obtained by thismethod is75.5%, and there is no significant contribution on classification performanceas increasing the number of keywords. In order to futher improve the classificationaccuracy, this paper proposed keyword extraction method based on discriminativeTF/IDF keywords, which is obtained by the difference between classes according to thedifferent combined statistics of same words. This method provided disntinctclassification accuracy of98.5%(the number of key words are100). Although,Discriminative keyword extraction method based on TF/IDF has its advantages ofeffectiveness, but we have to consider its disadvantags such as demanding largevolume keywords and lacking of theriotical base.Therefore, this article also referes SDA (sparse discriminant analysis) methodwhich is commonly used in the field of biotechnology. Experimental results show thatSDA method gifted the document classification accuracy of98%with smaller data set(the number of key words are90). Thus, in order to further improve the accuracy on asmall data set, the keyword extraction method based on SparseSVM is also studied. Inexperiments, when the numbers of keywords are10,20,30respectively, SDA methodshown the document classification accuracy of88.5%,90.5%,91.5%accordingly;while classification accuracy were90%,92%,95.5%using SparseSVM method.These results show that SparseSVM method is desirable for document classificationwith on smaller data set.In order to verify the performance stability of these technologies, this paper givesUyghur text emotional identification results using above four methods, and the resultsare satisfactory.

Keywords/Search Tags:

Uyghur text, Keyword extraction, Discriminative keywords, SparseSVM, Sparse discriminant analysis

PDF Full Text Request

Related items

1	Research On Uyghur Discriminative Keyword Extraction Algorithm And Its Performance Analysis
2	Based On Telephone Voice Recognition System Of Uyghur Keywords
3	Improved Sparse Discriminative Feature Extraction Methods
4	Research On Keyword Extraction Technology Oriented To Conversational Text
5	Automatic Extraction Of Keywords And Text Summarization In Text Mining
6	The Research Of Keywords Extraction Algorithm In Text Mining
7	Application Of Sparse Linear Discriminant Analysis On Text Classification
8	Research On The Filtering Method Of Uyghur Adverse Text Information
9	The Research Of Keyword Extraction Algorithms On English Short Test Text
10	An Efficient Keywords Extraction Algorithm For Text Comprehension