| Nowadays,with the rapid development of Internet technology and computer software and hardware,information on the Internet has expanded rapidly.How to obtain data,analyze data and mine the value of data have become the research hotspot,and data-related research has a broad application prospect.Text mining and text generation are an important part of natural language processing and one of the most widely studied fields.At present,many researchers have done a lot of research in the field of text mining and text generation have made many significant achievements.Acknowledgments for academic dissertations contain lots of useful information that can be used to make a quantity of meaningful research.There are several questions that need to be addressed in the study of the data of acknowledgements,including how to obtain masses of data,and from which perspectives to mine information and how to automatically generate the acknowledgements.In view of these problems,this paper mainly does the following work:(1)Aiming at the problem of data acquisition of a large number of academic dissertations,this paper designs a web crawler based on Selenium by analyzing the structure and characteristics of CNKI website.The web crawler is able to crawl the basic information and acknowledgment part of an academic dissertation from the CNKI website,and then we can clear the data and build the corpus.Research shows that the crawler is very powerful and can be a good grasp of special format data files,and can adapt to a variety of crawling environment,in a word,it is an effective method to obtain a mass of data.(2)For the problem of how to mine useful information from the data,this paper proposes machine learning classifier method,LDA theme analysis method and analysis method of cooperative relationship between schools.In this paper,we use machine learning classifiers to analyze the data from different disciplines and different ages,and analyze the keywords by the classification results.Besides,we use the LDA theme model to obtain the keywords of all the data,and study the main words of the data of the dissertation.This paper draws the name of the school from the data and studies the cooperative relationship between all the schools.Experiments show that the methods can effectively deal with some text mining problems.(3)In response to the problem of text generation,this paper studies the generation of acknowledgment text.In this paper,we summarize the template of the acknowledgements,use the clustering algorithm to generate text under the sentence level,and use the LSTM relational models to achieve the generation of acknowledgment text.Experiments show that the methods of this paper have a certain application value on the problem of text generation. |