Font Size: a A A

Research And Implementation Of Science And Technology Policy Classification Method Combining Topic Model And Deep Learning

Posted on:2020-07-16Degree:MasterType:Thesis
Country:ChinaCandidate:F SunFull Text:PDF
GTID:2428330599958548Subject:Computer technology
Abstract/Summary:PDF Full Text Request
Science and technology policy is a basic code of conduct for the realization of the country's scientific and technological tasks for a certain period of time.With the rapid development of science and technology,the textual data of science and technology policy is increasing gradually.Faced with such huge textual data of science and technology policy,how to obtain valuable information and manage it effectively has become an urgent problem for scientific researchers.This paper studies and implements the text categorization method of science and technology policy based on "Research and Development of Standardized Processing and Application System for Big Data of Science and Technology(172110113D)" and "Integrated Service Platform for Big Data of Science and Technology".According to the characteristics of science and technology policy texts,a text classification method of science and technology policy based on SSL-SLHDP+PXG is proposed.In order to further improve the accuracy of science and technology policy classification,a text classification method of science and technology policy based on WTR-BiGRU is proposed.The main work of this paper is as follows:(1)text classification method of science and technology policy based on SSL-SLHDP+PXGAccording to the characteristics of science and technology policy classification,an extended label sample method,SSL-SLHDP(Semi-Supervised Learning-SLHDP),is proposed by combining SLHDP(Semi-supervised Labeled HDP)model with semi-supervised method.In order to make up for the defect that XGBoost(eXtreme Gradient Boosting)classification algorithm has too many hyper-parameters to fit automatically,an improved XGBoost algorithm PXG based on particle swarm optimization is proposed.Finally,a text categorization method for science and technology policy based on SSL-SLHDP+PXG is proposed by combining the extended label sample method with the improved XGBoost algorithm.Firstly,the topic distribution generated by SLHDP model is used to represent the data set of science and technology policy.Based on the SSL-SLHDP method,some unlabeled samples are marked with high confidence in the training set,and the extended labeled samples are realized.Finally,based on the expanded training set,the PXG classification model is trained to realize the classification of science and technology policies.(2)text classification method of science and technology policy based on WTR-BiGRUFirstly,the classification of science and technology policy texts based on bidirectional gated recurrent unit(BiGRU)was realized.Because the distribution of topic vectors is very important for text categorization,the topic feature vectors are introduced into the BiGRU model.According to the different fusion methods,two improved models,WT-BiGRU-1 and WT-BiGRU-2,are proposed.In order to solve the problem of gradient descent when network layers increase,a WTR-BiGRU model is proposed by introducing residual block structure.A series of comparative experiments were designed on the data set of science and technology policy.The macro-F1 value,micro-F1 value,loss rate and iteration time were taken as evaluation indexes to verify the effectiveness and superiority of the improved model.The improved method further improves the accuracy and efficiency of text categorization.
Keywords/Search Tags:science and technology policy, topic model, semi-supervised, deep learning, BiGRU, residual block
PDF Full Text Request
Related items