Font Size: a A A

Realization Of Text Classification And Recognition Based On NLP Method

Posted on:2022-10-05Degree:MasterType:Thesis
Country:ChinaCandidate:H XuFull Text:PDF
GTID:2507306509989099Subject:Applied Statistics
Abstract/Summary:PDF Full Text Request
Natural language processing is a science that integrates computer knowledge,mathematics knowledge and linguistic knowledge,and text classification and recognition is considered an important research field and direction of natural language processing.Human beings have entered the era of big data,and data is closely related to people’s lives.Text data has become the most common type of electronic data due to its small memory and strong descriptive characteristics.In the face of a vast array of text data,how to accurately extract the required information while ensuring efficiency has become a very direct and realistic problem.This paper mainly studies the text classification model through the processing knowledge of text data in natural language processing and the related theories and technologies of machine learning.The experimental part of the article is programmed with Python software,and the following research work is mainly done:(1)This article comprehensively studies the relevant theories of text classification,narrates the development history and current situation of text classification,and introduces the process of text data processing.In the course of the experiment,the TF-IDF method was used to process text features,and three different word segmentation tools jieba,Snow NLP,and pkuseg were selected to segment the data.The final conclusion is that using pkuseg to process the data in this paper is the best.(2)This paper evaluates the performance of the algorithm by using weighted F1 value,accuracy and other indicators.In the experiment process,in addition to basic KNN,decision tree,support vector machine and other basic algorithms,it also uses ensemble learning methods,such as random forest,Gradient Boosting Tree,XGBoost,Light GBM methods for text classification.Through comparative experiments,it is found that the overall effect of the ensemble model is better than that of the basic model.(3)This paper combines the four basic models and four ensemble models respectively through Stacking integration strategy.Through comparative experiments,it is found that the model effect after Stacking fusion is better than that of a single model,but there are some exceptions,but from an overall perspective,the Stacking fusion strategy shows certain advantages.The best effect is the Stacking ensemble model with the gradient Boosting tree as the secondary learner.The weighted F1 value of this model is 0.908822,and the accuracy is about 90.73%.This verifies that the Stacking ensemble algorithm is an effective and accurate text classification algorithm.
Keywords/Search Tags:Natural Language Processing, Text Classification, Machine Learning, Stacking Ensemble Algorithm
PDF Full Text Request
Related items