Font Size: a A A

Research On Classification Algorithm Of Scientific Papers Based On Topic Model

Posted on:2019-06-05Degree:MasterType:Thesis
Country:ChinaCandidate:D R WuFull Text:PDF
GTID:2428330548461185Subject:Engineering
Abstract/Summary:PDF Full Text Request
In recent years,technology has been growing rapidly,the number of scientific papers has been growing rapidly,and the scientific paper has been a crucial part of the technology,deciding how to make progress in scientific research,and how to use a good scientific paper for scientific research is crucial.By using a high quality classification of scientific paper,it can help researchers to quickly find the resources that they need from a vast amount of technological resources,and the categorization of technology paper will effectively help the researchers to filter out redundant information,quickly and accurately to the search results,to improve the quality of the search,which is an important prerequisite for technological resources management.In this paper,we will study how to classify the scientific papers effectively,and classify them in the context of preserving the thematic nature of scientific papers.In this paper,by investigating the current state of the world and international research,to understand the current tools help the user to efficiently harness the use of scientific papers,the classification of scientific papers and commonly used method of classification of scientific papers.At present,various journals and search engines will classify the data sets of scientific papers that they contain,and existing tools can help users to read and learn scientific and technical literatures more effectively,such as Histcite,CiteSpace and other tools based on co-citation analysis.Most journals selfdefine the classification criteria for scientific papers,such as Bioinformatics,which are classified according to their own definitions.The essence of classification of scientific papers is the classification of unstructured text.Luhn first proposed the word frequency statistics method by extracting abstracts in 1958.On the basis of it,bayesian algorithm is gradually applied,and then the artificial construction classifier takes the dominant position.In recent years,support vector machines and other machine learning methods have been widely used in text classification.If we use the traditional text classification methods to classify scientific papers,which ignores scientific papers contain semantic features,ignores the relationship between the topics and documents,the relationship between the words and the topics,apparently use topic model to classify scientific papers is a more scientific method.The traditional topic model in the process of training to learn,need a constantly iterative process,which cost a lot of time and calculation,and easy to produce the component breakdown problem,so how to weaken these drawbacks of the topic model is very important.In this paper,we use the Autoencoding variational Bayes which is proposed by Rezende in 2014,and apply it in the topic model to solve the above problems.In this paper,the Latent Dirichlet Allocation(LDA)method is combined with the Autoencoding variational Bayes(AEVB),which replaces the original decoder with LDA,connects the topic vector and discriminator,and transforms the LDA,which is originally unsupervised,into a semi-supervised algorithm.In this paper,a large number of articles in web of science are used to classify experimental data,and the experimental results are compared with traditional classification methods and the results are analyzed.The main contribution of this paper is to propose a new topic model based on variational autoencode for the classification of scientific papers.This method focuses on the thematic nature of the thesis in the classification of scientific papers.This method solves the problem of reparameterization and the component collapsing when the variational autoencode is applied to the theme model.This method can significantly improve the accuracy rate of other methods and shorten the training time.
Keywords/Search Tags:topic model, text categorization, variational autoencode, Latent Dirichlet Allocation
PDF Full Text Request
Related items