Analysis Of Text Information Based On Deep Learning

Posted on:2019-10-30

Degree:Master

Type:Thesis

Country:China

Candidate:Y L Su

Full Text:PDF

GTID:2438330566473383

Subject:Information and Communication Engineering

Abstract/Summary:

PDF Full Text Request

With the growth of massive,diverse and fragmented information categories on the Interne,it is difficult to rapidly and accurately capture useful information.Thus,it is an urgent issue how to extract and represent text information for natural language processing.In addition,with the increasing development of Internet new medias,it is also an urgent problem how to classify original texts accurately and recommend somethings by judging users' interests.Therefore,this academic dissertation studies word segmentation,texts' vectorization representation,multi-feature integration and classification.This can not only help researchers further make some application studies on deep learning networks in natural language processing,but also provide some technical services for Internet new medias.The main works and the achievements acquired are summarized below:A.After two approaches of Maximum Matching(MM)and Hidden Markov(HM)are compared and analyzed with the aspects of their application scopes,superiority and shortcomings,an improved word-segmentation method is developed,which evaluates words' segmentation effects by sending the idea of word-tagging in the HM model to the MM model.Comparatively experimental results show that the richness of the content of a dictionary decides whether the MM method can effectively perform words' division;the HM method is a low-efficiency approach;the improved word-segmentation method can correctly divide texts' words with a high probability while being capable of effectively dividing ambiguous words.B.For the problems of text-vectorization and classification,an improved text-vectorization approach is first designed to reflect texts' feature information,in which a text vector is acquired by linearly weighting those feature vectors acquired by the TextRank keyword extraction approach and word2 vec in terms of the TF-IDF's word-frequency feature vectors.After that,an improved multi-feature integration based k-nearest neighbor algorithm is developed to carry out text classification,in which an adaptive correction rule on k is designed to decide the value of k given in the k-nearest neighbor algorithm by utilizing the max-class proportion in a small area and the densities of points.Comparative experiments indicate that the improved k-nearest neighbor algorithm is clearly superior to those compared approaches with the aspects of classification's effect and efficiency.C.Whereas the conventional classification approaches cannot solve the problem of large-scale data classification,a multi-feature fusion based text classification approach is proposed in terms of linked feature vectors.Here,each linked feature vector is acquired by linking a feature vector from the feature vectorization approach and another feature vector from the convolutional neural network,in which the latter feature vector is obtained by means of a feature matrix produced by the TextRank keyword extraction approach and word2 vec.Comparative experiments illustrate that the classification approach can effectively carry out text classification while its performance efficiency is superior to that of the conventional recurrent neural network.

Keywords/Search Tags:

Keyword extraction, Text vectorization, Adaptive k-nearest neighbor classification algorithm, Convolutional neural network, Text classification

PDF Full Text Request

Related items

1	Research On Chinese Text Classification Based On Keyword Strategy And CNN
2	Improved Word Embedding And K-nearest Neighbor Algorithm For Chinese Text Classification
3	Research On News Text Classification Based On Convolutional Neural Network
4	Research And Application Of Web Text Classification
5	Research On Keyword Extraction Technology Oriented To Conversational Text
6	Research And Implementation On Chinese Text Classification Algorithm Based On Convolutional Neural Network
7	Research And Implementation Of Text Classification Algorithm Based On Three-way Decision And Convolution Neural Network
8	Research On Text Classification Algorithm Based On Mixed Convolution
9	Research And Implementation Of Chinese Text Classification Algorithm Based On Machine Learning
10	Application Of Natural Neighbor In Text Classification