Realization Of Text Classification And Recognition Based On NLP Method

Posted on:2022-10-05

Degree:Master

Type:Thesis

Country:China

Candidate:H Xu

Full Text:PDF

GTID:2507306509989099

Subject:Applied Statistics

Abstract/Summary:

Natural language processing is a science that integrates computer knowledge,mathematics knowledge and linguistic knowledge,and text classification and recognition is considered an important research field and direction of natural language processing.Human beings have entered the era of big data,and data is closely related to people’s lives.Text data has become the most common type of electronic data due to its small memory and strong descriptive characteristics.In the face of a vast array of text data,how to accurately extract the required information while ensuring efficiency has become a very direct and realistic problem.This paper mainly studies the text classification model through the processing knowledge of text data in natural language processing and the related theories and technologies of machine learning.The experimental part of the article is programmed with Python software,and the following research work is mainly done:(1)This article comprehensively studies the relevant theories of text classification,narrates the development history and current situation of text classification,and introduces the process of text data processing.In the course of the experiment,the TF-IDF method was used to process text features,and three different word segmentation tools jieba,Snow NLP,and pkuseg were selected to segment the data.The final conclusion is that using pkuseg to process the data in this paper is the best.(2)This paper evaluates the performance of the algorithm by using weighted F1 value,accuracy and other indicators.In the experiment process,in addition to basic KNN,decision tree,support vector machine and other basic algorithms,it also uses ensemble learning methods,such as random forest,Gradient Boosting Tree,XGBoost,Light GBM methods for text classification.Through comparative experiments,it is found that the overall effect of the ensemble model is better than that of the basic model.(3)This paper combines the four basic models and four ensemble models respectively through Stacking integration strategy.Through comparative experiments,it is found that the model effect after Stacking fusion is better than that of a single model,but there are some exceptions,but from an overall perspective,the Stacking fusion strategy shows certain advantages.The best effect is the Stacking ensemble model with the gradient Boosting tree as the secondary learner.The weighted F1 value of this model is 0.908822,and the accuracy is about 90.73%.This verifies that the Stacking ensemble algorithm is an effective and accurate text classification algorithm.

Keywords/Search Tags:

Natural Language Processing, Text Classification, Machine Learning, Stacking Ensemble Algorithm

Related items

1	Research On Semantic Classification Model Of Teaching Evaluation Based On Feature Weighted Stacking Algorithm
2	Research On Forecasting Esg Ratings Based On Stacking Algorithm
3	Prediction Of Shoppers’ Purchasing Intention Based On Stacking Ensemble Classification
4	Research On Chinese Web Forum Based On Natural Language Processing
5	Research On Network Public Opinion Classification Algorithm Based On Machine Learning
6	CPI Prediction Research Based On Machine Learning Theory
7	Research On Student Achievement Prediction Model Based On Ensemble Learning Algorithm
8	A Course Recommendation Algorithm Based On Sentiment Analysis Of Pop-up Text Comments
9	Research On Multi-task Text Analysis Based On BERT
10	Research On Ensemble Learning Algorithm Of Classification Based On Cost-sensitive