Font Size: a A A

Research On Multi-granularity Structured Parsing Method For Online Preferential Enterprise Policy Texts

Posted on:2022-06-08Degree:MasterType:Thesis
Country:ChinaCandidate:B C YiFull Text:PDF
GTID:2518306524482704Subject:Management Science and Engineering
Abstract/Summary:PDF Full Text Request
With the continuous development of China's economy and society,small and medium-sized enterprises are now in a very important strategic position in China's economic and social development.In order to support the further development of small and medium-sized enterprises,the central and local governments have issued a large number of preferential policy documents for the development of small and medium-sized enterprises,and various localities have launched government enterprise service platforms to promote better exchanges between small and medium-sized enterprises and the government.However,these government enterprise service platforms simply integrate the policies of various government departments together,and few do further analysis and Research on the policy texts.In order to better serve small and medium-sized enterprises,it is necessary to analyze the lengthy policy texts structurally and extract knowledge units at different levels,so as to meet the needs of small and medium-sized enterprises at different levels,and realize the functions of quickly locating the target policy module through the modular processing of texts,mastering the policy content through key words,and grasping the policy dynamics through themes.Therefore,this paper analyzes the policy text structurally from the three granularity levels of text,keyword and theme,in order to realize the informatization,digitization and intellectualization of the government enterprise service platform.The main contents of this paper are as follows:First of all,in order to facilitate enterprise users to quickly locate the target module content of policy,aiming at the text granularity,this paper constructs a two-stage text classification model based on three-way decision.In the first stage,this paper combines the deep learning method with the traditional machine learning method.According to the proposed classification confidence and classification ability,the objects with lower discrimination in the first stage are placed in the boundary domain to wait for secondary classification,so as to realize the parsing of unstructured policy text into structured module content.For example: the core content,support conditions,application materials,application process,contact information of these five structured module content,which is conducive to the enterprise users targeted access,browsing.Secondly,in order to highlight the focus of the policy,the policy content is condensed and summarized,which is convenient for enterprise users to quickly understand the policy content from the micro level.Aiming at the keyword granularity,this paper proposes an integrated oversampling method from the perspective of dataset balance.According to the results of text classification,the core content module is used to construct the initial data set.After that,new samples are synthesized by cyclic sampling,and the unbalanced data set is transformed into the class balanced data set,so as to improve the effect of keyword recognition.In each round of sampling process,according to the classification confidence,the training sample set is divided into positive domain,negative domain and boundary domain by means of three decisions and logical regression,and different sample composition strategies are adopted for each domain.Finally,the class balanced data set is applied to machine learning algorithm to achieve keyword extraction.By integrating the balanced data set constructed by the oversampling method,the key words of the policy text can be extracted,which is helpful for enterprise users to quickly grasp the main information of the policy,and to consult and retrieve the policy.Finally,in order to understand and analyze the policy content from a macro perspective and understand the policy tendency,this paper proposes a topic discovery method based on Co-word network for topic granularity.This paper uses the extracted keywords to construct the initial data set.By considering the social attributes of Co-word network,quantifying the relationship between direct neighbors and indirect neighbors,mining the association strength between text keywords,the concept of comprehensive influence degree is proposed,and the DEMATEL method is improved to identify the core keywords.Finally,hierarchical clustering of these core keywords is carried out to summarize the theme of the policy text.By extracting themes from policy texts,it is helpful for enterprise users to grasp the current policy trends and policy tendencies from a macro perspective.In order to meet the needs of different levels in the practical application of small and medium-sized enterprises,this paper analyzes the text of preferential policies from three granularity levels of text,keyword and theme,and puts forward new solutions accordingly.Through the method of this paper,not only enrich and expand the relevant research theory,but also guide the construction of government enterprise service platform from the practical application.
Keywords/Search Tags:Three-way decisions, text classification, sampling method, co-word analysis, DEMATEL
PDF Full Text Request
Related items