Font Size: a A A

Research On The Information Techn Ology Word Collacation Extraction Method Based On Multi Statistical Method Cascade

Posted on:2020-03-17Degree:MasterType:Thesis
Country:ChinaCandidate:S W WuFull Text:PDF
GTID:2428330611999780Subject:Electronic and communication engineering
Abstract/Summary:PDF Full Text Request
The study of Chinese word collocations is being continuously deepened.However,there is not much research on word collocation in the professional field,which is also a new research direction of word collocation extraction.With the advent of the 5G era,the research on natural language processing in information technology will soon be deepened.Therefore,the information technology collocations mainly studied in this paper and used to build the information technology knowledge base are also an important research direction in the field of information technology language science.Word extracting in Information Technology based on Mutual Information and Adjacent Entropy.Because this article studies the information technology word collocation extraction method.Professional vocabulary is one of the main differences between the information technology professional context and the general context.Most information technology vocabularies are combination words.If there is no information technology vocabulary,it is easy to cause word segmentation errors,destroy the structure of the combination words,and cause low accuracy of collocation extraction.The traditional word discovery methods are mainly based on word frequency and rules.The method based on word frequency is easy to cause problems in extracting new word boundaries and low accuracy.The rule-based method is prone to frequent rule formulation problems,which is heavy and meaningless.This article uses a combination of mutual information and adjacent entropy to find professional vocabulary.Mutual information can represent the closeness of the combination of words and adjacent words.Adjacent entropy can define the boundaries of professional words,saving the workload of formulating rules and increasing Discover the efficiency and accuracy of professional words.Method for extracting collocation of information technology words based on multi-statistic cascade.The traditional statistical methods of collocation extraction have their own shortcomings.The accuracy of extraction and the comprehensive evaluation index are low.It is necessary to screen and combine various statistical methods to improve the accuracy of extraction.In this paper,after using the information technology vocabulary found in this information technology vocabulary,the information technology vocabulary is screened step by step using a multi-statistic cascade method to obtain the information technology word collocation set.Compared with the traditional statisticalmethod extraction,the results show that the method used in this paper has improved the accuracy of extraction of information technology corpora,and the comprehensive evaluation indicators have improved.To sum up,this paper deeply studies the professionalism of information technology corpora,and based on this characteristic,it conducts professional word discovery and multi-statistic cascade extraction and matching,which improves the accuracy of extraction and matching.
Keywords/Search Tags:word collocation, collocation extraction, knowledge base, corpus
PDF Full Text Request
Related items