Font Size: a A A

Research And Implementation Of Multilingual Text Classification System Based On Deep Learning

Posted on:2020-05-17Degree:MasterType:Thesis
Country:ChinaCandidate:X Y MengFull Text:PDF
GTID:2428330572989367Subject:Computer technology
Abstract/Summary:PDF Full Text Request
With the development of information technology and globalization,the analysis and sharing of multilingual text information has become an indispensable part of people's life and work.Therefore,the research of multilingual text classification technology becomes more and more important.Most of existing text classification research results are oriented to a single language environment.When dealing with texts in different languages,it is often necessary to train multiple monolingual text classification systems to support multilingual data sets,which has a high cost of work.Therefore,it is urgent to develop a multilingual text classification system to meet the changing needs of users.In this dissertation,we researched and developed a multilingual text classification system for abstracts of scientific and technological literature in Chinese,English and Korean based on deep learning neural networks.We adopted the strategy of extracting features from each language independently and then merged them to solve the problem of language barriers.We built a deep neural network model to improve the classification performance,and then designed and implemented a multilingual text classification system,which laid the technical foundation for the construction of a cross-language sharing platform for Chinese,English and Korean literatures.Firstly,we collected more than 90,000 abstracts of scientific and technological literature in Chinese,English,and Korean,and divided them into 13 categories according to content,which organized them into multilingual parallel corpora.Secondly,a multilingual text classification model based on bidirectional long short-term memory and convolutional neural network was proposed.The text representation of each language was composed of topic vectors and word vectors,which were inputted into the corresponding sub-neural network model to extract the deeper text features of the language,then the features of each language were fused,and produced the final classification results.Finally,the functional modules of the system were analyzed and designed to realize an automatic classification system for multilingual texts.The system can classify texts in any language of Chinese,English and Korean,and stored them according to categories.It can also provide users with functions such as modifying categories and viewing documents for convenient management.At the same time,the user can also update the classifier online according to the requirements,and increase the controllability of the user while ensure the classification accuracy.The proposed multilingual text classification model reduces the dependence on external resources.Experimental results show that the proposed multilingual text classification model based on bidirectional long short-term memory and convolution neural network improves the classification accuracy by 2 to 5 percentage compared with other classical methods.In addition,the multilingual text classification system designed and implemented in this dissertation has perfect functions,which meets the needs of practical application.
Keywords/Search Tags:multilingual text classification, topic model, word embedding, long short-term memory, convolutional neural network
PDF Full Text Request
Related items