Research On The Method Of Text Categorization Based On Semantic Similarity

Posted on:2018-04-26

Degree:Master

Type:Thesis

Country:China

Candidate:X Zhao

Full Text:PDF

GTID:2348330563452386

Subject:Software engineering

Abstract/Summary:

PDF Full Text Request

As an important part of data mining,text classification has been widely used in information filtering,personalized recommendation,search engine,digital library and other fields,has a strong practical significance.However,with the development of the Internet,the text classification research encountered two problems difficult to avoid :First,the size of the text data set is too large,resulting in larger amount of computing,hardware burden is too large.How to segment the data set efficiently and correctly,choose the set heplful to classification is the key to ease pressure of the hardware;the second is the difficulty of dealing with the synonyms and polysemous words.Many researchers try to find a breakthrough from the particularity of the text data--semantics,but how to deal with the polysemous and synonyms in the text becomes a major problem that researchers need to solve.In order to solve these two problems,this paper proposes a data segmentation method based on K-nearest neighbor algorithm.This method is based on the K nearest neighbor algorithm.According to different test samples,it can select the most similar to the test sample several categories to compose a sub-data set.And this method can solve the problem caused by the data set with too large scale.In order to reduce the influence of polysemy words and synonyms on classification results,this paper presents a feature selection method based on semantic similarity.This feature selection method is introduced in detail by flow chart.Word Net is used to calculate the similarity between feature words in text The feature extraction phase transforms the text set into a feature matrix based on semantic similarity.Based on the feature selection method and the data segmentation method,a text classification method based on semantic similarity is proposed.Through the comparison experiment,it is verified that the text classification method based on semantic similarity can improve the accuracy of classifier.Finally,a text classification system based on semantic similarity is designed and implemented.The design requirements of the text classification system,The design requirements of the text classification system,the system structure,the function of each module and the key classes in the realization process are described.The contents and workflow of each module are described through the flow chart of the module.The interface of the classification system and the parameter setting interface are presented in the form of pictures,and the realization process of the text classification system is described in detail in the form of flowcharts.

Keywords/Search Tags:

text categorization, semantic similarity, latent semantic analysis, support vector machine, Split the data set

PDF Full Text Request

Related items

1	Research On Support Vector Machines Classification Algorithm In Text Categorization
2	Research On Web Text Categorization Based On Latent Semantic Analysis
3	Research And Implementation Of Chinese Text Categorization System Based On Semantic Similarity
4	Research On Text Classification Filtering Technology Based On Latent Semantic Indexing And Support Vector Machine
5	Research On Ontology-Based Semantic Text Categorization
6	The Implementation And Research Of The Probabilistic Latent Semantic Analysis Model In The Search Engine's Business Text Classification System
7	Latent Semantic Analysis In Language Identification
8	Research And Apply On Patient Record Text Mining Based On Latent Semantic Analysis
9	Research On Text Classification Based On Ontology And Latent Semantic Indexing Algorithm
10	Research On Text Sentiment Analysis Based On Support Vector Machine