Font Size: a A A

Research And Application Of Domain-Specific New Word Detection Combining Statistics And Semantics

Posted on:2021-02-28Degree:MasterType:Thesis
Country:ChinaCandidate:Q W LiuFull Text:PDF
GTID:2518306107952879Subject:Electronics and Communications Engineering
Abstract/Summary:PDF Full Text Request
With the development of science and technology,more and more new technologies are presented in various fields.At the same time,a great deal of text data has been created for mining and new words to be discovered.In different fields,the expansion of lexicon is a work to be completed.Therefore,the accurate mining and identification of new words in specific fields has become an important research to be carried out.This paper adopts a kind of unsupervised method and proposes a DTopWordS-SS model,combining with the existing DTopWordS model,puts forward the improved Apriori method as well as the improvement of information entropy model,on the basis of the original model is a blend of statistics and semantics,effectively improve the accuracy of the field of new words found in,thus to better identify the unknown words,build a relatively complete dictionary,can dig out more conform to the category of low-frequency words.The model was evaluated on the corpus used by DTopWordS,and the experiment proved that:DTopWordS-SS model can mine and analyze domain words from multiple dimensions,and can effectively improve the accuracy of mining new words in the domain.In addition,the DTopWordS-SS model in this paper has been applied based on the data set of scientific research management documents,and new words in different fields have been mined from this data over the years,which has realized the intuitive embodiment of research points in the scientific research field.And through the construction of each year's thesaurus,through the frequency distribution of the same word,the election of the year's hot word is presented.On the whole,the model effectively mines the new words in the text and presents the hot spots,and intuitively analyzes the relevant documents of scientific research,which has certain research value and practical application value.
Keywords/Search Tags:Domain-Specific New Words Detection, DTopWordS, Statistics, Semantic, Hot words
PDF Full Text Request
Related items