Font Size: a A A

Research On Terminology Recognition Of Chinese Information Science Theory Based On Deep Learning

Posted on:2020-09-28Degree:MasterType:Thesis
Country:ChinaCandidate:Q GuanFull Text:PDF
GTID:2518305732973619Subject:Books intelligence
Abstract/Summary:PDF Full Text Request
The study of theory and method is the driving force for the continuous development of the discipline.It is a very important task to understand the application and development of the current theories and methods in the subject area.In the field of information science,literature reading method,inductive summarization method,content analysis method and bibliometric method are all traditional research methods used to reveal the application development of theoretical methods.Most of these methods are based on artificial and have certain requirements on the professional background of personnel.At the same time,due to limited manpower,sampling surveys are usually conducted in the study.The amount of data is not comprehensive enough,and the analysis results may be partial.Therefore,in this paper,the branch of the task is identified by the named entity--the term recognition,and the theoretical methods of information science are studied.By collecting about 20,000 relevant articles in the field of information science in the past 20 years,the deep learning model-Bi-LSTM-CRF is applied.Conduct large-scale corpus training and testing,verify its feasibility through experiments and explore the impact of each experimental variable on the model's effect,in order to maximize the effect of model recognition.In the data processing stage,by establishing a dictionary of information theory methods and introducing it into the word segmentation system,the word segmentation accuracy rate is improved.At the same time,in order to obtain the model training corpus,the article uses the BIO notation method to carry out the information theory,method and model term entity.Label.During the experiment,the following experimental variables were explored to influence the results of the term recognition experiment:(1)Corpus granularity.In this paper,the corpus based on word segmentation and the word segmentation corpus are selected.Through experimental comparison,it is found that the corpus training based on word segmentation has better effect on the theoretical method term entity recognition.This is through statistics and found that the proportion of 4-6 word theory method term is 60%.(2)Training corpus.Four groups of controlled trials were designed and trained in 20%,40%,60%and 80%proportions.The results showed that the experimental results were positively correlated with the training corpus ratio.(3)Different entity types.In this paper,the three different entity types of theory,method and model are compared and found that the model term entity recognition result is the best,followed by the theoretical term entity,and the worst recognition result is the method term entity.By studying the differences between the types of entities,the results show that the word formation characteristics of the model term entity are the most obvious,while the number of method term entities is the most and the word formation features are not obvious.(4)Different characteristics.This paper introduces four different characteristics of word vector,part of speech,word length and pinyin.The control experiment is carried out.It is found that except for the pinyin feature,the other features can improve the F1 value,and the word vector and the part of speech feature are improved.In order to further evaluate the recognition effect of the Bi-LSTM-CRF model on the terminology entity of the informatics theory method,this paper introduces the traditional CRF model and uses the same experimental data and experimental methods to conduct comparative experiments.The experimental results show that the CRF model performs better in simple recognition tasks.However,when the recognition task becomes complicated and the corpus volume increases significantly,the Bi-LSTMCRF model works better.Finally,by combining the results of manual annotation and model identification,this paper has carried out relevant quantitative analysis and visualization of the information theory and methods in the past 20 years.From the overall application of the theoretical method,to the evolution analysis by time,combined with the joint analysis of the issuing unit and the journal,the general conclusion is drawn.
Keywords/Search Tags:Information science, terminology recognition, deep learning, Bi-LSTM-CRF model
PDF Full Text Request
Related items