Font Size: a A A

Research On Multilingual Short Text Classification Method Based On Deep Learning

Posted on:2019-03-26Degree:MasterType:Thesis
Country:ChinaCandidate:J LiuFull Text:PDF
GTID:2428330545458880Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
Multilingual interaction is an important research areas of natural language processing,and the demand for multilingual interaction is unavoidable,so the analysis and fusion of data in different languages become indispensable.Most rules of existing text classifiers are trained for one language.After transforming the language area,classifiers often need to adapt the new rules to data sets of different languages.Therefore,it is of great value to the research and application of multilingual text categorization.In this dissertation,the abstracts of scientific and technical literature are targeted at classifying multilingual short text including Chinese,English and Korean.The transformation and fusion strategy for multilingual text features were adopted to solve the problem of domain adaptation of classifiers in different languages,and the accuracy of the classifier was improved by the deep learning strategy to provide the fundamental basis for multilingual information processing.Firstly,more than 90,000 abstracts of scientific and technical literature collected from the multilingual literature management system project served to construct a parallel corpus of three languages in China,English and Korea.The number of words in abstracts vary from 100 to 300,which are characterized by a large number of terminology and fuzzy classification boundaries.It is difficult to classify them accurately by terms of word features and existing probabilistic statistical models.Hence the semantics of the abstracts were described with the aid of a deep learning method.Secondly,the relationship between features in different language spaces was obtained by statistics,so as to attain auto-associative memory among languages.All data were expressed completely in a multilingual model space by expanding the monolingual text data according to auto-associative memory relation.Finally,local perception and weight sharing theory of convolution neural networks were applied to the amalgamation of complex semantic expression in auto-associative memory model,therefore,the phrase features of different lengths were obtained.The dense combination of high-level semantic features for any type language was learned using deep neural network,and classification prediction results were obtained.In the extended convolution neural network model,the classification accuracy was improved effectively.The method proposed in dissertation reduces the dependence of multilingual text on parallel corpus overwhelmingly,simply because the test data can be from any type of language as long as it is included in the training corpus.Experiments show that convolutional neural network combined with auto-associative memory improves classification accuracy by 2%to 6%in multilingual text classification compared to other classic model.In addition,this model is also suitable for a positive and negative emotion classification for cross language emotional corpus,which is much more effective than other existing algorithms.It testify that the model has great robustness to the review textual data.
Keywords/Search Tags:Text classification, Deep learning, Auto-associative memory, Convolutional neural network, Word embedding, Local perception
PDF Full Text Request
Related items