Font Size: a A A

Research On Semantic Similarity Based On Text Categorization

Posted on:2017-02-27Degree:MasterType:Thesis
Country:ChinaCandidate:C L WangFull Text:PDF
GTID:2348330485986309Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
With the rapid development of Internet technology, various kinds of text have crowded into the Internet and grow in high speed. So it is impractical to select the proper text by means of artificial screening. Therefore, the intelligent computer technology is needed in collecting all types of messages on the network automatically. Its main purpose is to classify the chaotic texts and calculate the semantic similarity in the same category. And text classification plays an important role in information retrieval, information filtering, automatic classification etc. Two aspects were studied in this paper: first, to build the text sentiment classification system through hybrid frame, which is the mixing of improved Semantic Comprehension and Machine Learning; Second, to subclass the text by using relevant theories of text sentiment classification. And calculate the semantic similarity based on the sub classifications, applying to the Construction Testing Training System test questions classification module. The results will be summarized as the following aspects:(1)Concept emotion similarity calculation formula was proposed based on the introduction of emotional sememe to solve the problem which HowNet calculation of word similarity without considering the word emotional factors. And word similarity computation accuracy is improved in a certain extent.(2)SO_PMI system has been perfected with the use of my own formula on the choosing of paradigm words and window size. At the same time, synonyms have been imported to SO_PMI, solving the problem of data sparsity to cater for people's habits of expression.(3)With considering of word frequency, dispersion as well as positive and negative related vocabulary items' effects, the mutual information feature selection will be improved.(4)On the deficiency of Semantic Comprehension and machine learning in text sentiment classification, the combination of improved Semantic Comprehension and Machine Learning will improve the accuracy and portability of the classifier.(5)Since the existing semantic similarity has some shortcomings, a new theory that imports text classification to semantic similarity calculation has been come up with. In other words, only in the same category can the semantic similarity text classification being calculated necessarily.
Keywords/Search Tags:Text sentiment classification, Semantic similarity, Hybrid frame, SO_PMI, Mutual Information
PDF Full Text Request
Related items