Font Size: a A A

Research And Design Of Text Topic Classification Based On Key Words

Posted on:2018-07-11Degree:MasterType:Thesis
Country:ChinaCandidate:L HuiFull Text:PDF
GTID:2348330518494697Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
Since modern times, with the continuous development of science and technology,the Internet has gradually entered people's lives. On the Internet, the scale of the information included in the scientific literature also showed a trend of explosive growth. Text automatic classification technology is becoming more and more important in the field of text analysis and processing. How to quickly and accurately classify the text has become a very important research topic. Because the traditional text classification method has low classification efficiency and poor classification effect in the large amount of text classification, this paper proposes a keyword based text topic classification algorithm. This algorithm firstly uses specific algorithm to obtain the text keywords, and then uses the keywords to text classification, it can effectively improves the text classification accuracy; at the same time,this algorithm by introducing parallelism in text classification algorithm also improves the efficiency of massive text classification.The main work of this paper can be listed as follows:Firstly, This paper studies the current situation of the traditional text classification method, finds out the shortcomings of the traditional text classification methods in the context of large data, analyzes the characteristics and advantages of the topic of the text classification algorithm based on keywords, emphatically introduces the related algorithm in text classification data acquisition and storage, the topic of the text selection and text classification technology and etc.Secondly, this paper designs a system model of text topic classification based on keywords. Based on the traditional classification methods and techniques, this paper analyzes the characteristic words produced in the traditional text classification process,and finds out the key words. By selecting the keywords of the topic words, and using these keywords to classify the text topic, the proposed algorithm improves the accuracy of text classification. Due to the large scale of data to be processed, the idea of parallel processing is introduced in the algorithm prototype system. In order to improve the efficiency of text classification, the algorithm uses the distributed framework to accelerate the data processing of text classification.At last, in this paper, we implement a prototype system of text topic classification based on keywords ,the algorithm prototype system chooses topic keywords for text topic classification,and uses Hadoop MapReduce framework to process text classification data in parallel. Through the evaluation of the corresponding operating efficiency and classification effect of the system, the topic classification algorithm proposed in this paper has higher running efficiency and good classification effect. The prototype system has been applied in the research work of an institute.
Keywords/Search Tags:text classification, Bayes classification, text topic, Hadoop
PDF Full Text Request
Related items