Font Size: a A A

Research On The Method Of Text Categorization Based On Semantic Similarity

Posted on:2018-04-26Degree:MasterType:Thesis
Country:ChinaCandidate:X ZhaoFull Text:PDF
GTID:2348330563452386Subject:Software engineering
Abstract/Summary:PDF Full Text Request
As an important part of data mining,text classification has been widely used in information filtering,personalized recommendation,search engine,digital library and other fields,has a strong practical significance.However,with the development of the Internet,the text classification research encountered two problems difficult to avoid :First,the size of the text data set is too large,resulting in larger amount of computing,hardware burden is too large.How to segment the data set efficiently and correctly,choose the set heplful to classification is the key to ease pressure of the hardware;the second is the difficulty of dealing with the synonyms and polysemous words.Many researchers try to find a breakthrough from the particularity of the text data--semantics,but how to deal with the polysemous and synonyms in the text becomes a major problem that researchers need to solve.In order to solve these two problems,this paper proposes a data segmentation method based on K-nearest neighbor algorithm.This method is based on the K nearest neighbor algorithm.According to different test samples,it can select the most similar to the test sample several categories to compose a sub-data set.And this method can solve the problem caused by the data set with too large scale.In order to reduce the influence of polysemy words and synonyms on classification results,this paper presents a feature selection method based on semantic similarity.This feature selection method is introduced in detail by flow chart.Word Net is used to calculate the similarity between feature words in text The feature extraction phase transforms the text set into a feature matrix based on semantic similarity.Based on the feature selection method and the data segmentation method,a text classification method based on semantic similarity is proposed.Through the comparison experiment,it is verified that the text classification method based on semantic similarity can improve the accuracy of classifier.Finally,a text classification system based on semantic similarity is designed and implemented.The design requirements of the text classification system,The design requirements of the text classification system,the system structure,the function of each module and the key classes in the realization process are described.The contents and workflow of each module are described through the flow chart of the module.The interface of the classification system and the parameter setting interface are presented in the form of pictures,and the realization process of the text classification system is described in detail in the form of flowcharts.
Keywords/Search Tags:text categorization, semantic similarity, latent semantic analysis, support vector machine, Split the data set
PDF Full Text Request
Related items