Font Size: a A A

Research On Feature Selection Method For Short Text

Posted on:2022-12-07Degree:MasterType:Thesis
Country:ChinaCandidate:S J HuangFull Text:PDF
GTID:2518306746483064Subject:Master of Engineering
Abstract/Summary:PDF Full Text Request
With the advent of the era of big data,the Internet is entering people's lives everywhere.Information and data also appear in various forms: voice,video,image and text.Among these data forms,text data has been widely spread with the advantages of large transmission rate and small memory.Because of this,the data volume of text information has increased sharply.How to get the most useful information from a large amount of text has become a hot spot today.To get information from text,we first need to classify it,and text classification comes into being.Due to the sparse and ambiguous data of short text,the classification of this text form has become a challenging task in text classification.The effect of text classification mainly depends on the result of text feature selection.It is very important to explore the direction of feature selection.This paper studies the feature selection algorithm of short text,analyzes and improves the algorithm combined with the characteristics of short text,and verifies the algorithm with the experimental data of film review text and news text.This paper mainly studies and improves the information gain algorithm,and introduces Bert model and attention mechanism combined with the content of deep learning.The main work contents are divided into the following aspects:First,the study of theoretical knowledge.Each classification process definition,related ideas,classification method,and important algorithm phases are examined.It explains popular feature selection approaches,as well as their benefits and drawbacks;it also introduces the input and output,as well as the preprocessing procedure,of the Bert model employed in the research;and it examines relevant understanding of attention mechanisms.Secondly,the improvement idea of information gain algorithm is given.Through the analysis of the information gain algorithm,aiming at the two shortcomings of the algorithm calculation formula: Ignore word frequency,and the effects of factors on the classification and not present situation for the classification of key interference is greater than the contribution,algorithm respectively introduced the word frequency factor and balance factor,and aiming at the problem of feature selection algorithms ignore the part of speech of the introduction of part of speech filtering step,an improved information gain method,effectively enhance the effect of feature selection.Thirdly,Bert model and attention mechanism are introduced to the improved algorithm by analyzing the characteristics of short texts.An improved feature selection model based on attention mechanism is proposed to effectively solve the problem of sparse ambiguity and strong contextual relevance of short text features.Finally,experiment and result analysis.Short text,the author of this paper,respectively in the two classification and classification of data sets to put forward the improved information gain algorithm and improvement of the experimental verification feature selection model,and through the accuracy and recall rate and the F1 value evaluation index to judge the result of the experiment,proves that the improved algorithm and improve the effectiveness of the model on the feature selection.
Keywords/Search Tags:Feature selection, Information gain, Attention mechanism, Bert
PDF Full Text Request
Related items