Font Size: a A A

Design And Implementation Of Summarization System For Three Types Of Announcements Of Listed Companies

Posted on:2021-01-31Degree:MasterType:Thesis
Country:ChinaCandidate:Z F ZhouFull Text:PDF
GTID:2428330614958261Subject:Electronic and communication engineering
Abstract/Summary:PDF Full Text Request
Listed company announcements are reports issued by the company in accordance with the requirements of the China Securities Regulatory Commission.The rich information contained therein plays an important role in the analysis of business conditions,investment decisions and departmental supervision.The automatic summarization of the announcement can quickly obtain the core information of the announcement and improve the reading efficiency.However,the structure of the announcement is complicated,the content of the announcement is lengthy and there is much interference information in it.The traditional summarization methods are difficult to accurately generate a smooth announcement summary.This thesis proposes a summarization method based on the summary template,and takes three types of announcements as research objects,then designs three different summary element extraction methods,and fills the extracted summary elements into the template to generate announcement summary.It is divided into three parts:1.Analyzing the characteristics of announcement structure,and proposing a classification method of announcements based on document structure.The announcement titles and first-level subtitle are classified as classification features.Firstly,a announcement subtitle extraction scheme is designed.Secondly,the extracted subtitles and announcement title are used to form feature text,and then the text is vectorized using pre-trained word vectors.Finally,a convolutional neural network is constructed to classify the announcement.Experiments show that the scheme achieves an F1 value of 98.47% on the announcement classification task.2.The summary templates for the announcements of lifting of bans,periodic reports,mergers and acquisitions(M&A)were formulated.Based on the summary templates,summary elements were extracted and filled into the templates to generate summary.For the announcement of lifting of bans,the rule method is used to match the summary sentences,and the announcement summary is formed after the post-processing.For periodic report announcement,the summary fields are extracted by means of table positioning and table analysis,and then they are filled into the template to generate the summary.For mergers and acquisitions announcements,it is divided into two stages:summary sentence extraction and summary field extraction.In the first stage,thesummary sentences are extracted by combining the title rules and the content rules.Firstly,a set of rule identifiers is customized;Secondly,the title rules,content rules,and rule combination expressions at various levels are formulated according to the identifiers;Finally,the combined expressions are analyzed to extract the summary sentences.The second stage is to build a multi-feature fusion named entity recognition model to recognize the summary fields in the summary sentences.Firstly,word vectors and character vectors are pre-trained on the corpus;Secondly,a domain dictionary is proposed to construct the feature vectors,and then merge the feature vectors with the word vectors and character vectors;Thirdly,Long Short-Term memory network is used to model the context semantics,and then the conditional random field is used to get the best label of the text sequence;Finally,summary fields are extracted by means of tag parsing,and they are filled into the template to generate announcement summary.After experiment and verification,for the announcements of lifting of bans,the average F1 of summary sentences extraction reaches 98.59%.The average F1 of summary fields extraction of periodic reports reaches 98.10%.The average F1 of summary sentences extraction of M&A announcements reaches 96.47%,and the average F1 of summary fields recognition reaches 93.51%.The experimental results show that based on the summary template,the accuracy of summary element extraction can be guaranteed by using different summary element extraction methods according to different announcement categories.3.For three types of announcements: lifting of bans,periodic reports,mergers and acquisitions(M&A),a announcements summarization system are designed and implemented based on theoretical research according to the actual business needs.The system include four modules: announcements acquisition module,announcements classification module,announcements summary generating module,display and storage module.The functions of each module are integrated to realize the automatic generation of announcements summary.In addition,the entire system and individual modules are tested.
Keywords/Search Tags:announcements summarization, summary template, classification, rule, named entity recognition
PDF Full Text Request
Related items