Font Size: a A A

Research And Implementation Of Domain Dictionary Automation Construction Technology Based On Supervised Learning

Posted on:2020-12-16Degree:MasterType:Thesis
Country:ChinaCandidate:K SiFull Text:PDF
GTID:2428330623951400Subject:Computer technology
Abstract/Summary:PDF Full Text Request
In recent years,with the rapid development of science and technology,people's communication methods have undergone earth-shaking changes.Every day,more than one million electronic documents are circulated on the Internet,and the literature of various disciplines has appeared in large numbers.A large number of new domain vocabulary,domain vocabulary fully embody and carry the core knowledge covered by the disciplines of today's society.The constant change of vocabulary can reflect the development process of time and space in a subject area at different levels.Therefore,domain vocabulary can make it easy for us to understand and even understand the development status and future trends of a subject area.A better understanding of domain knowledge has important theoretical and practical significance.Therefore,it is extremely urgent to design a better domain dictionary construction method.This paper focuses on domain dictionary construction techniques based on supervised learning.main tasks as follows:(1)This paper first introduces the domain dictionary construction related technology and summarizes it.On the basis of this,the traditional domain dictionary construction method has low accuracy.This paper proposes a domain dictionary construction method based on supervised learning.Firstly,some data processing is performed on the text to ensure the accuracy of keyword extraction,and then the feature extraction of the word is performed.Then,a classifier is trained to extract the article keywords using light GBM,and finally the domain dictionary is constructed according to some rules proposed in this paper.(2)In order to verify the accuracy and feasibility of the method,two sets of experiments were designed.The first set of experiments is to compare the text Rank keyword extraction algorithm.The experimental results show that the proposed method is accurate and feasible.The second group uses the domain dictionary of this paper to identify the document.The experimental results show that the document identification accuracy is higher.The experiment proves that the method proposed in this paper is accurate and feasible.(3)On the above method,this paper implements the domain dictionary construction system,which has the functions of domain dictionary expansion and domain document recognition.It has been verified that the domain dictionary extracted by the system has high accuracy and the domain dictionary can be continuously updated.
Keywords/Search Tags:keyword extraction, supervised learning, domain dictionary construction, lightGBM
PDF Full Text Request
Related items