Font Size: a A A

Study Of The Application Of Text Classification Techniques On Weibo

Posted on:2016-07-26Degree:MasterType:Thesis
Country:ChinaCandidate:J Y WangFull Text:PDF
GTID:2308330464470853Subject:Computer technology
Abstract/Summary:PDF Full Text Request
Today’s society, the rapid development of information technology and the rapid spread of the Internet, making digital information resources increased dramatically. How to get useful information from vast amounts of text information but unstructured information processing field forced to become a problem to be solved. The main task of the text is the text classification feature comparison of existing class, according to the semantic information is not classified text discriminant category to which it belongs, the text is automatically divided into one or more categories of pre-set text category. The main purpose of the text is to help people to quickly classify the information classification, and thus be able to find the information they need to effectively according to the category.Microblogging is a new social networking platform, is one of the new media. Microblogging text belongs to the category of short text, concise terms, the amount of information included in the number of different characteristics. In addition, in the form of micro-blog text more freely, spread fast, real-time, which also contains a lot of valuable information. For example, which includes people of various social phenomena of different views and positions, topics related to all areas of economic, military and entertainment. Therefore tweets classification in mining interest, lively topic of tracking and found buzzwords analysis, early warning and other areas of public opinion have a broad application prospects. Up to now, text classification microblogging application are less text classification technology will be applied to the micro-blog information classification has extensive positive meaning.In this paper, classification techniques used in the micro-Bo, a short text. This paper describes the contents of the status quo at home and abroad research points in this category, and then describes the general process of text classification, describe several key stages of text classification, such as pre-processing, text representation, feature selection. Then do the typical classification algorithm descriptions, text classification used to K nearest neighbor (KNN), Naive Bayes. Characteristics of the advantages and disadvantages of combining text and microblogging KNN algorithm, the paper KNN classification algorithm has been improved. KNN algorithm has improved the classification model training time is short and fast and so on. In addition, the lack of improved algorithm for KNN, this paper based on the microblogging feature classification algorithms, fully integrated micro-blog topic, hobbies and other information on the user’s own micro-blog classification, to further improve the accuracy of classification. Experimental results show that, KNN improved algorithm in the application of micro-blog text classification, the more KNN algorithm and Naive Bayes algorithm, the classification effect is more ideal, but supplemented microblogging feature classification to further improve the classification results..
Keywords/Search Tags:KNN, text classification, short text classification, Weibo
PDF Full Text Request
Related items