Research On Multi-granularity Structured Parsing Method For Online Preferential Enterprise Policy Texts

Posted on:2022-06-08

Degree:Master

Type:Thesis

Country:China

Candidate:B C Yi

Full Text:PDF

GTID:2518306524482704

Subject:Management Science and Engineering

Abstract/Summary:

PDF Full Text Request

With the continuous development of China's economy and society,small and medium-sized enterprises are now in a very important strategic position in China's economic and social development.In order to support the further development of small and medium-sized enterprises,the central and local governments have issued a large number of preferential policy documents for the development of small and medium-sized enterprises,and various localities have launched government enterprise service platforms to promote better exchanges between small and medium-sized enterprises and the government.However,these government enterprise service platforms simply integrate the policies of various government departments together,and few do further analysis and Research on the policy texts.In order to better serve small and medium-sized enterprises,it is necessary to analyze the lengthy policy texts structurally and extract knowledge units at different levels,so as to meet the needs of small and medium-sized enterprises at different levels,and realize the functions of quickly locating the target policy module through the modular processing of texts,mastering the policy content through key words,and grasping the policy dynamics through themes.Therefore,this paper analyzes the policy text structurally from the three granularity levels of text,keyword and theme,in order to realize the informatization,digitization and intellectualization of the government enterprise service platform.The main contents of this paper are as follows:First of all,in order to facilitate enterprise users to quickly locate the target module content of policy,aiming at the text granularity,this paper constructs a two-stage text classification model based on three-way decision.In the first stage,this paper combines the deep learning method with the traditional machine learning method.According to the proposed classification confidence and classification ability,the objects with lower discrimination in the first stage are placed in the boundary domain to wait for secondary classification,so as to realize the parsing of unstructured policy text into structured module content.For example: the core content,support conditions,application materials,application process,contact information of these five structured module content,which is conducive to the enterprise users targeted access,browsing.Secondly,in order to highlight the focus of the policy,the policy content is condensed and summarized,which is convenient for enterprise users to quickly understand the policy content from the micro level.Aiming at the keyword granularity,this paper proposes an integrated oversampling method from the perspective of dataset balance.According to the results of text classification,the core content module is used to construct the initial data set.After that,new samples are synthesized by cyclic sampling,and the unbalanced data set is transformed into the class balanced data set,so as to improve the effect of keyword recognition.In each round of sampling process,according to the classification confidence,the training sample set is divided into positive domain,negative domain and boundary domain by means of three decisions and logical regression,and different sample composition strategies are adopted for each domain.Finally,the class balanced data set is applied to machine learning algorithm to achieve keyword extraction.By integrating the balanced data set constructed by the oversampling method,the key words of the policy text can be extracted,which is helpful for enterprise users to quickly grasp the main information of the policy,and to consult and retrieve the policy.Finally,in order to understand and analyze the policy content from a macro perspective and understand the policy tendency,this paper proposes a topic discovery method based on Co-word network for topic granularity.This paper uses the extracted keywords to construct the initial data set.By considering the social attributes of Co-word network,quantifying the relationship between direct neighbors and indirect neighbors,mining the association strength between text keywords,the concept of comprehensive influence degree is proposed,and the DEMATEL method is improved to identify the core keywords.Finally,hierarchical clustering of these core keywords is carried out to summarize the theme of the policy text.By extracting themes from policy texts,it is helpful for enterprise users to grasp the current policy trends and policy tendencies from a macro perspective.In order to meet the needs of different levels in the practical application of small and medium-sized enterprises,this paper analyzes the text of preferential policies from three granularity levels of text,keyword and theme,and puts forward new solutions accordingly.Through the method of this paper,not only enrich and expand the relevant research theory,but also guide the construction of government enterprise service platform from the practical application.

Keywords/Search Tags:

Three-way decisions, text classification, sampling method, co-word analysis, DEMATEL

PDF Full Text Request

Related items

1	Exploring Dialogue Text Classification Based On Word Mixture Vectors
2	Research Of Imbalanced Text Tendency Classification For Network Public Opinion Based On Three-way Decisions
3	Research On A Text Classification Method Based On The Concatenated Of Word Vector And Doc2vec
4	Research On Sentiment Classification Of Chinese Micro-blog Text Based On Three-way Decisions
5	Research On Imbalanced Data Sampling Methods For Text Sentiment Classification
6	Multi-granular Text Sentiment Classification For Method Research Based On Machine Learning
7	Research And Application Of Internet Chinese Text Classification
8	Research And Implementation Of Text Sentiment Analysis System Based On Neural Network Model
9	Research On Text Classification Method Based On Convolutional Neural Network
10	Chinese Keyword Extraction Method Based On Word Span And Its Application In Text Classification