Font Size: a A A

Study On Extraction Of Uygur Keywords In Public Opinion Analysis

Posted on:2016-02-02Degree:MasterType:Thesis
Country:ChinaCandidate:X K SuFull Text:PDF
GTID:2308330476950399Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
With the rapid development of network information, in the world the explosive growth of information, and at the same time, the digital information and network in Xinjiang area is also steadily developing, the amount of information grows day by day,has made the despair of the traditional manual screening efficacy information theme and theme word in a text. In the front of the massive data, how to help users to get want information fast and efficiently, that’s a direction which people have always been exploring. Key words extraction in text mining, certainly can help people greatly. Due to the geographical differences and backward in technology, a mature, open and practical Uighur keywords extraction method has not been formed and used, the realization of this method will be helpful and useful for the development and information retrieval, also for public opinion monitoring, search engines and other fields,for the government, medical, education and other department, too. It could play an important role for the development of the Xin Jiang, so the realization of the Uighur keywords extraction method could be one representation and model of the minority language.Through the statistics of word order, word combinations, the weights of impact factors calculated, the comprehensive weighting and ranking of candidate words,filtering for the stopwords and the words whose frequency are lower, filtering for the combined words, and then extract the keywords,after filtering and adding weight some times, these can make the readers through these words to identify the theme and the main idea of a content. And then we transplanted the methods into the Uighur single text keywords extraction process, we also took the Uighur stemming and stopwords filtering into consideration. On the premise of reducing the influence on the accuracy of abandoning TFIDF algorithm, we realized the comprehensive characteristic statistical method based on the weight, and extracted the Uighurkeywords for the single text successfully at last. Due to the abandon for the TFIDF algorithm, the efficiency got improved greatly. Experiments showed that, the accuracy rate of text recognition could maintain above 65%, the accuracy rate of single text recognition for Uighur could maintain above 56%,at the same time to the unlabeled words also with feedback in a certain extent.
Keywords/Search Tags:weight, keywords, single text, word combinations, Uighur
PDF Full Text Request
Related items