Font Size: a A A

The Research On IRT And Rule Space For Text Classification

Posted on:2006-08-29Degree:MasterType:Thesis
Country:ChinaCandidate:W ZhuFull Text:PDF
GTID:2168360152982899Subject:Computer software and theory
Abstract/Summary:PDF Full Text Request
Fast development of Internet and technology leads to a rapid increase in the amount of available information. In order to take advantage of these resources efficiently, it's urgent for us to separate them into classification according to their contents. Most of these data are represented by text; hence it's very useful to research the technique of classifying text automatically.In this thesis, we come up with a new text classification algorithm based on Item Response Theory and theory of Rule Space. The procedure is to pick out the key words of each classification as test items based on training set, regards the result of lowering dimension of text vector as a examinee's item response pattern, the vectors set of all training texts is a score matrix, and then we can estimate item parameters of each keyword and define the Tatsuoka Rule Space Model; when it comes to classify text, we use the known IRT model to evaluate testing text and judge the kind label based on the defined Tatsuoka Rule Space.In order to promote efficiency of classifying text, we propose a new parameter-estimation method which is more speedy and effective. The results of MontoCarlo simulation demonstrates that the minimize X2/EM algorithm is more effectivefor item and ability parameters recovery.The main creative points of this thesis are:· Breaking through the conventional study of IRT and apply it to text classificationfirst time. The experimental results show that the new method has better recall and costs little, the precision is under improving.· Introducing a new method to estimate the parameters of IRT model, we infer from data of experiments that the new method can be applied to all cases and the result of estimation is stable even on the condition of fewer items and fewer examinees; a test including fewer unusual response patterns can also be evaluated and that the results compared with homogeneous software dealing with 2PLM are accepted using mean absolute error as the criterion.
Keywords/Search Tags:Item Response Theory (IRT), parameters estimation, Tatsuko rule space, text classification
PDF Full Text Request
Related items