Research On Short Text Classification For Tender Project Name

Posted on:2018-03-31

Degree:Master

Type:Thesis

Country:China

Candidate:H H Shi

Full Text:PDF

GTID:2348330542468708

Subject:Computer software and theory

Abstract/Summary:

PDF Full Text Request

Messages,the assessments of online products,or weibo texts show explosive growth trends.It is significant that short essay plays a important role in the process of information transmission.Long text has rich semantic characteristics.However,short text has no such characteristics and sparse matrix make classification and deep data mining difficult for us.It is relatively mature that topic model be used in data mining technology of long text.Although,short text processing has been still in the framework of long text processing.Relevant external information is expanded to short text in recent theses,topic model included.It is not generally that when difficulty of search relevant corpus of short text and dependence on quality of related information be considered.Corpus of bidding project names is a typical Chinese short text data set.In recent years,bidding websites relying on manual collection and processing can't match the increasingly fierce market environment.It is urgent that automatic bidding websites be developed.The website associated with this thesis can realize automatic acquisition,processing and analysis of bidding project names.This thesis emphasizes the classification issue.Compared with long text corpus,shorter text data set made up of bidding project names from a wide variety of websites is sparse.Specific experimental processing details will be shown in this thesis.Firstly,TF-IDF and IG are selected in feature selection methods.Bayes classification method is integrated with feature selection methods.Classification results are evaluated by F value.Secondly,this thesis puts forward rules-based feature selection methods,including the whole phrase,all words deleted before the first key word and all words weighted in the phrase.Weight assignment is the best among three rules.Precision rate increases though recall rate decreases.Last but not least,this thesis improves LDA.Result of IG and result of LDA fuse together.It reveals that precision rate increases and also recall rate improves.Validation of this method be certified in this thesis.For further promotion,this method can be put into practice of Chinese short text data set classification.

Keywords/Search Tags:

LDA topic model, TF-IDF, IG, native bayes, feature selection

PDF Full Text Request

Related items

1	The Study Of Chinese Text Categorization Based On Na(?)ve Bayes
2	Research And Application Of Text Classification Model Based On Topic Model
3	Research On News Topic Detection Based On Feature Selection And Word Vector Weighting
4	An Artificial Immune Based Na?ve Bayes Model For Software Defect Predict
5	Supervised Latent Dirichlet Allocation Combined With Feature Selection Of Sparsity
6	Classifying Domain Questions Into Subcategories Through Topic Enriching
7	Research On Network Information Dissemination And High Performance Computing
8	Application Of CTM Model Optimization Feature Selection In Text Categorization
9	Research On Feature Expansion And Classification Of Short Text Based On Topic Model And Deep Learning
10	Research And Application Of Topic Selection Scheme Based On The Analysis Of Book Market