Research On Method Of Recognition And Classification Of Formulaic Language Based On Deep Learning

Posted on:2024-06-25

Degree:Master

Type:Thesis

Country:China

Candidate:Y J Zheng

Full Text:PDF

GTID:2568307064472254

Subject:Computer technology

Abstract/Summary:

PDF Full Text Request

Formulaic language is a multi-word unit that appears frequently in a whole form,with continuous or discontinuous words.It generally has a clear meaning and function.The research on recognition and classification of formulaic language has an important promoting effect on improving the standardization of text expression,the accuracy of semantic mining,the authenticity of machine translation,and the logic of intelligent question answering.However,traditional research mainly relies on linguists to manually identify and classify formulaic language,which is costly and inefficient.In recent years,some researchers have begun to propose methods for automatic recognition and classification of formulaic language based on statistical machine learning,but these methods often fail to strike a balance between efficiency and accuracy.To address the problem of high manual recognition costs and poor automatic classification performance,this article proposes a method for recognition and classification of formulaic language based on deep learning.The main contributions and innovations are as follows:(1)In response to the lack of coarse corpus screening in existing methods,which results in a large and complex number of formulaic language recognition samples and low efficiency,this paper proposes a method for predicting sentences containing formulaic language based on multifeature fusion.This method first constructs a classification model to determine whether a sentence contains formulaic language.The model uses the semantic and part-of-speech features of the sentences in a late fusion way to predict the probability that the input sample contains formulaic language.Then,sentences with probabilities exceeding the threshold are retained as samples for subsequent formulaic language recognition,thereby reducing the sample size and improving recognition efficiency through initial screening.Experiments on the academic phrase library and collections of papers show that the method is effective in filtering coarse corpus,which lays a foundation for the subsequent research on formulaic language recognition.(2)In response to the problem of low accuracy in formulaic language recognition caused by incomplete feature extraction in existing methods,this paper proposes a method for identifying formulaic language based on GCN fusing associated information.Considering the characteristics of high co-occurrence frequency and correlation of the various words that make up the formulaic language,this method constructs each sentence into a graph.The words in the sentence are nodes,and the part-of-speech features and semantic features of late fusion are basic features of the nodes.The edges of the connection nodes are determined by using the point mutual information values and the dependency syntactic relationships between words,and then uses graph convolution neural networks to extract the association information between words.Finally,the extracted feature information is input into the conditional random field for decoding,and the label category of each word is obtained,so that the formulaic language can be recognized.The experimental results show that the F1 Score of this method reaches 83.5%,significantly higher than existing methods,verifying the effectiveness of this method in recognizing formulaic language.(3)In response to the problem that it is difficult to break through its own limit by using a single classifier in existing methods,resulting in poor classification performance,this paper proposes a method of formulaic language classification based on Bi-LSTM and Stacking.This method utilizes Glo Ve and Bi-LSTM to extract features from text,and introduces the Stacking ensemble learning algorithm.By using the Pearson correlation coefficient,the method selects Logistic Regression,Random Forest,Multilayer Perceptron,and K-Nearest Neighbor as base classifiers with low correlation,Random Forest as the meta classifier.Finally,the model’s performance is evaluated based on the meta-classifier’s prediction results.Comparative experiments have shown that the Precision,Recall,and F1 Score of our method are 1.36%,2.56%,and 2.64% higher than those of the Bagging ensemble learning model.This verifies that the method can integrate the classification results of multiple single classifiers and further improve classification effect.

Keywords/Search Tags:

Formulaic language, Multi-feature fusion, Associated information, Graph convolution neural networks, Stacking ensemble learning

PDF Full Text Request

Related items

1	Research On Text Classification Based On Graph Convolution Neural Network Fusion Model
2	Design And Implementation Of Cross-language News Event Graph Fusion
3	Research On Video Moment Retrieval Based On Graph Convolutional Networks
4	Research On Multi-feature Fusion Expression Recognition Method Based On Header Information
5	Collaborative Recommendation Based On Multi-graph Fusion
6	Research And Implementation Of Defense Algorithm Against Adversarial Attack For Graph Convolution Neural Network
7	Research On Knowledge Graph Completion Based On Multi-source Heterogeneous Information Fusion
8	Research On Application Of Deep Learning Models For Feature Representation And Classification
9	Gait Recognition Research Based On Multi-Stream Feature Fusion Of Skeleton And Graph Convolution
10	Research On Intrusion Detection Technology Based On Multi-Feature Fusion And Ensemble Learning