With the development of natural language processing,the research on the accuracy and depth of short text analysis is also deepening.The use of natural language processing techniques to analyze massive short text data and extract valuable information from the vast amount of data,combined with domain business requirements,for further mining and research is becoming an urgent need.This article aims to combine domain knowledge graphs and perform semantic analysis primarily based on short text classification on massive short text data,in order to better explore the potential semantic information expressed by relevant short texts in domain business.From the implementation of short text classification based on traditional machine learning to the current implementation based on deep learning methods,the effectiveness of short text classification has been continuously improved.However,there are still shortcomings,mainly manifested in three aspects:(1)the limited number of short text classification datasets and the uneven distribution of label categories,which limits the generalization ability of algorithms;(2)short texts do not follow grammatical rules and have colloquial features,making it difficult for algorithms to identify their intrinsic semantics;(3)the short length of the texts lacks context and domain information support,further limiting the algorithm’s classification performance.Based on this,the research content of this paper mainly focuses on the following aspects:(1)A data augmentation method based on domain knowledge graph,i.e.,DCKGDA(Domain Class Knowledge Graph Data Augmentation),is proposed.To address the problem of limited datasets,external knowledge such as domain knowledge graphs is utilized.By retrieving entity nodes related to label categories in the sentence and replacing these words and phrases with coexisting nodes,superordinate words,or subordinate words in the graph while preserving the original semantics of the statement,more diversified data with stronger generalization ability is generated.With the guidance of domain experts,a batch of original customer service evaluation data in the automotive field was annotated and underwent a full sample test to verify the effectiveness of the proposed data augmentation method on this dataset.(2)A pre-trained model-based short text classification method is proposed to address the issue of short texts not following grammatical rules and having colloquial features.Firstly,a label classification system is constructed based on the domain business requirements,and then a pre-trained model-based short text classification model is built on the proposed data augmentation method.To address the issue of polysemy in static word vectors and improve the accuracy of overall understanding of short texts,the dynamic word vectors of ALBERT pre-trained models were used to replace the word embedding layer of the Text CNN model.The semantic features are convolved and pooled by Text CNN,and the normalized label category is output.Experimental results show that the introduced pretrained model and data augmentation method can effectively improve the performance of short text classification compared to classical methods.(3)A knowledge-based attention ALBERT convolutional neural network(KAACNN)method is proposed,which integrates knowledge graph and label information for intent recognition.It includes four modules: short text encoding,concept information encoding,label information encoding,and multidimensional intent recognition.To address the issue of short text intent recognition,which has a short length and lacks context and domain information support,the method merges the normalized label categories output by the pretrained model-based short text classification method with short text encoding.Additionally,it retrieves short text conceptual knowledge from the knowledge graph and calculates the importance of each concept to the short text through an attention mechanism to avoid the impact of knowledge noise in the process of knowledge fusion and enrich the semantic information of short text by integrating the conceptual knowledge of the knowledge graph.Finally,a multi-dimensional intent analysis system is constructed based on business needs,and experimental results show that the four modules proposed in this study can effectively address the existing problems with short text and improve the accuracy of short text intent recognition.(4)A short text semantic analysis system based on domain knowledge graph has been designed and developed.Based on the above model algorithm and combined with actual business application scenarios,functions such as domain knowledge graph construction and visualization,intelligent semantic analysis of customer feedback,data statistics and analysis,and report generation have been implemented.Targeting the data analysis scenario of service evaluation for automobile dealers,the system automatically classifies and mines customer feedback information,providing valuable reference opinions for automobile dealers and improving customer satisfaction and service quality. |