Exploring Dialogue Text Classification Based On Word Mixture Vectors

Posted on:2021-01-03

Degree:Master

Type:Thesis

Country:China

Candidate:Y J Yu

Full Text:PDF

GTID:2518306302954259

Subject:Applied Statistics

Abstract/Summary:

PDF Full Text Request

Recent years,natural language processing has received extensive attention from academic and industrial fields.As one of the important applications of natural language processing technology,human-machine dialogue technology has been studied by lots of scholars.Intent recognition,a key task in the human-machine dialogue system,enables chatbots to understand texts semantically and classify them into correct category.The performance of the intent recognition module will affect the quality of the human-machine dialogue system.Improving the human-computer dialogue system’s ability to recognize the user’s input of text can serve users more efficiently,while reducing the pressure on manual customer service and corporate expenses.Data used in this experiment was from a bicycle-sharing company.By analyzing 1200 user input texts samples that could not be identified by the robot,it shows that about 26% of them could have been solved by supplementing the knowledge base etc.This paper processed all the samples collected,hoping to obtain a model that has better classification effect than that of customer service robot,and improving its intention recognition ability.Preliminary analysis shows that dialogue data share some common characteristics,multiple entity names,absence of tags,short in length.The new word discovery algorithm can find new words in texts.By setting thresholds for word frequency and mutual information when processing new word discovery algorithm,I reduced some unnecessary calculations.It also shows that the multiple-word words have high mutual information and low left-right information entropy.This feature were used to realize multi-word words discovery.Secondly,this paper makes full use of labeled data,using keyword extraction method and text similarity algorithm to simplify data labeling.Keyword extraction algorithm can find out category keywords.The text similarity algorithm can calculate similarities between two texts.In the process of text similarity calculation,this article combined the character-based algorithm and the word vector-based algorithm,adjusting the edit distance algorithm to preserve the similarity between words.Furthermore,considering requirements of the running speed,this paper decided to use lightweight neural network.Based on the research and analysis of convolutional neural network algorithms,the char-word mixture word representation is used as the text representation,extracting information from different granularities.In this experiment,samples were trained through Word2 vec to obtain word level word vector.Then using embedding layer to get char level word vector representation.After using matrix transformation to align the two granularity word vector representations,word level word vector and character level word vector were spliced to get char-word mixture word representation.By comparing these experimental results,we can find that a series of measures taken in this paper improved the classification accuracy of the text data.The results show that the new word discovery algorithm in this paper optimizes the word segmentation effect and improves the classification accuracy to a certain extent.Compared with other models,convolutional neural networks based on char-word mixture word representation have achieved higher accuracy and faster running speed.Moreover,the classification effect of the model used in this paper is much better than original one’,which also proves that the method can effectively improve the intention recognition ability of the customer service robot.

Keywords/Search Tags:

Text Classification, New Word Discovery, Text Similarity, Char-Word Mixture Word Representation

PDF Full Text Request

Related items

1	Research On Word Similarity Computation Method Based On Non-IID Learning
2	Research On Chinese Text Similarity Detection Technology Based On Word Weight Analysis
3	Research Of Network Bad Word Discovery Model Based On Designing Idea Of AlphaGo
4	Research On The Representations Of Word And Text And Text Classification
5	Research And Application Of Internet Chinese Text Classification
6	Dynamic Weighting Of Word Embedding And Distributed Learning Strategies
7	Research On Text Similarity Algorithm Based On VSM Combined With Word Semantics
8	Study On Chinese Text Similarity Computing Based On Word Segmentation
9	Research On Text Classification Based On Word Vector
10	Research On A Text Classification Method Based On The Concatenated Of Word Vector And Doc2vec