Study On Extraction Of Uygur Keywords In Public Opinion Analysis

Posted on:2016-02-02

Degree:Master

Type:Thesis

Country:China

Candidate:X K Su

Full Text:PDF

GTID:2308330476950399

Subject:Computer application technology

Abstract/Summary:

With the rapid development of network information, in the world the explosive growth of information, and at the same time, the digital information and network in Xinjiang area is also steadily developing, the amount of information grows day by day,has made the despair of the traditional manual screening efficacy information theme and theme word in a text. In the front of the massive data, how to help users to get want information fast and efficiently, thatâ€™s a direction which people have always been exploring. Key words extraction in text mining, certainly can help people greatly. Due to the geographical differences and backward in technology, a mature, open and practical Uighur keywords extraction method has not been formed and used, the realization of this method will be helpful and useful for the development and information retrieval, also for public opinion monitoring, search engines and other fields,for the government, medical, education and other department, too. It could play an important role for the development of the Xin Jiang, so the realization of the Uighur keywords extraction method could be one representation and model of the minority language.Through the statistics of word order, word combinations, the weights of impact factors calculated, the comprehensive weighting and ranking of candidate words,filtering for the stopwords and the words whose frequency are lower, filtering for the combined words, and then extract the keywords,after filtering and adding weight some times, these can make the readers through these words to identify the theme and the main idea of a content. And then we transplanted the methods into the Uighur single text keywords extraction process, we also took the Uighur stemming and stopwords filtering into consideration. On the premise of reducing the influence on the accuracy of abandoning TFIDF algorithm, we realized the comprehensive characteristic statistical method based on the weight, and extracted the Uighurkeywords for the single text successfully at last. Due to the abandon for the TFIDF algorithm, the efficiency got improved greatly. Experiments showed that, the accuracy rate of text recognition could maintain above 65%, the accuracy rate of single text recognition for Uighur could maintain above 56%,at the same time to the unlabeled words also with feedback in a certain extent.

Keywords/Search Tags:

weight, keywords, single text, word combinations, Uighur

Related items

1	A Uighur Words Recognition Technology Based On Contour
2	Research On Chinese Text Similarity Detection Technology Based On Word Weight Analysis
3	Illegal Experimental Application Classifier Based On Keywords
4	Automatic Extraction Of Keywords And Text Summarization In Text Mining
5	Research And Implementation Of Text Mining Technology Based On Public Security Information
6	Research On Automatic Question Answering System In Restricted Domain Based On Chinese Weighted Keywords Tree
7	Research On Keywords In Information Science Based On Word Vector
8	Method Of Webpage Keyword Extraction Based On Word Span
9	Research On Multi-Keywords Retrieval Based On HowNet
10	A Study Of Key Techniques For Uighur Handwriting Recognition