Font Size: a A A

The Study Of Micro-Blog Text Classification Base On Multi-Label Learning Framework

Posted on:2017-03-23Degree:MasterType:Thesis
Country:ChinaCandidate:P Y GaoFull Text:PDF
GTID:2348330491464542Subject:Computer technology
Abstract/Summary:PDF Full Text Request
Along with the development of internet industry, a group of short-text media has emerged, such as micro-blog, SMS, voice message etc. Compared with traditional media, they have shorter text length, faster transmission speed, faster information updating speed and various text forms. However, domestic research in the text classification of new media is still in the beginning status. For the traditional machine learning framework has the limitation of single label classification and text vector representation,it is inadequate for the short text classification, Multi instance Multi-label learning framework, unlike most of the classification learning framework, can reflect the characteristics of samples more accurately and comprehensive. This framework is able to classify messages more effectively, and would be more suitable when applied to the short text scenes.In such context, this paper applies multi label learning framework algorithm to micro-blog short text classification, and proposes a new method of text similarity calculation, to be specific, three aspects are involved:(1) Study of multi label learning framework algorithm:ML-kNN is studied and improved, the specific processes of the algorithm, implementation principle, applied scenes, comparison with traditional supervised learning algorithm are investigated; the deficiencies of the algorithm is clarified according to short text classification characteristics.which leads to the following text similarity computing work.(2) A new text similarity computing method is proposed and the concept of bags of category (BOC) is introduced. Combined with short text classification and several traditional text similarity computing methods, a text similarity computing method based on BOC with Map Reduce is proposed as the core method. By introducing a corpus with a class label, this algorithm can solve the problem arsing from the lack of featuring information and the difficulty of synonym identification.(3) The classification results of Sina micro-blog data are tested. By using association rules and analyzing six months micro-blog Sina data, the testing process selects micro-blog text with classification value. Through expriment with two traditional text similarity calculation methods, the classification effect of proposed text similarity calculation method is examined in the multi label learning framework algorithm.
Keywords/Search Tags:MIML learning framework, Micro-blog text classification, Text similarity computing
PDF Full Text Request
Related items