Font Size: a A A

The Research Of Semantic Kernel In SVM For Chinese Text Classification

Posted on:2018-06-22Degree:MasterType:Thesis
Country:ChinaCandidate:J TanFull Text:PDF
GTID:2428330620457778Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
Classification is the basic problem of artificial intelligence,there are many kinds of classifier,and support vector machine is one of these classifier which has good learning ability due to the principle of minimum structure risk?It is a classifier based on the technique of kernel trick,which can mapping data points in a linearly inseparable space into the separable one and estimate their distance in the original space.Before any task of text processing,text data needs to be preprocessed in order to remove any useless linguistic feature.however,there are still so many redundant features due to semantic relation between terms.At present,the assumption of text classification is that the data dimension is independent of each other.Therefore,how to exploit the semantic relations in the text to improve the classification performance is the current research focus.When people communicate with each other,they can understand each other's information by means of semantics,but it is very difficult to express the semantic relation in computer.At present,the semantic relations are represented with ontology,knowledge graph and statistics.Although the knowledge graph and ontology can reflect the semantic relations well,it needs a lot of human resources,and can not include all semantic relations in the language,.The statistical method,based on the internal statistical properties of the text corpus,can be used to represent the semantic relations in any language.Therefore,the combination of the two methods can been used to better express the semantic relations in the text.For any classification problem,it has been easily solved if anyone can find a suitable kernel function.For the text classification problem,it is necessary to find a semantic kernel which exploits the relation between words to improve the classification performance.At present,there are many applications of the semantic kernel function has been used in English text processing.The semantic kernel function can be subdivided into the semantic function based on ontology or knowledge graph and the semantic kernel function based on statistics.Due to this reason,it is necessary to find the combination of the two methods to improve the classification performance.Based on the deep research on the existing semantic kernel function,the main contents of this paper are as follows:(1)The introduction of techniques in automatic text classification,especially for SVM(2)The introduction of kernel trick and semantic kernels for text classification.(3)For the problem that the word bag representation of the text does not take into account the problem of polysemy and synonymy,the external knowledge is used to solve the problem.(4)According to the semantic relation between ontology and training set,this paper proposes a semantic kernel function.The kernel functions are used to solve the correlation between the dimensions of text representation.(5)A Chinese text classification system based on semantic information and support vector machine(SVM)is constructed.The experimental results show that the proposed kernel function can fully utilize the knowledge provided by external to improve the classification performance.
Keywords/Search Tags:Text Classification, Semantic kernel, Ontology, SVM, Semantic Relatedness
PDF Full Text Request
Related items