Font Size: a A A

Research On Short Text Classification Of Academic Report Titles Based On Feature Extension

Posted on:2019-04-15Degree:MasterType:Thesis
Country:ChinaCandidate:S XiaFull Text:PDF
GTID:2428330548985952Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
The amount of academic reports on the network is large,how to find the academic report information quickly and accurately in the field of interest from these data resources has become an urgent problem to be solved.Academic reports are normally published in the form of text.Therefore,the processing of the academic report is a typical text classification problem.Text classification is one of the key techniques of data mining.However,the title of the academic report is short,the information contained therein is less and the features are sparse,which traditional text classification methods often failed to accurately classified.This thesis makes a deep discuss about the above issues.The main research contributions are as follows:(1)The main task of classification of the academic report is to classify report titles into different research areas.However,there is no existing standard dataset for report title classification.We find that the title of scientific paper is very similar to the academic report title.Therefore,a crawler system for the title and abstract database of scientific paper is designed to collect sufficient scientific paper titles and abstracts,which are used for training dataset of academic report classification.(2)In view of the problem that the feature sparsity of the academic report title leads to the poor classification effect,a new method of short text classification based on feature extension is proposed.It has been applied to the classification of the title of the academic report and has achieved good results.In our experiment,Word2Vec is used as a tool kit for word vector training and feature expansion.then compare the parameters that need to be adjusted when the feature is extended.analysis the influence of different parameters on the classification results.Finally,the combination of the optimal classification parameters is determined.(3)As to the poor classification effect of traditional text classification methods on academic report titles,in this thesis,a new short-text classification method is proposed,where multiple classifiers are combined to filter classification results in every step of classifying process by setting various parameters.The rationality and effectiveness of the proposed method are verified by experimental results.
Keywords/Search Tags:academic report, feature expansion, text classification, threshold parameters
PDF Full Text Request
Related items