| The structures of announcement information of mergers and acquisitions(M&A)from listed companies can provide effective data support for investment and financing decision-making,market supervision,stock market prediction,enterprise portrait and other fields,and become an important part of the application service development of stock market and securities market.How to realize the structure of announcement information of M&A and restructuring accurately and efficiently is becoming one of the most important problems for financial and securities companies.The announcement of merger and acquisition is a kind of free and long text with fixed format.According to the characteristics of announcement,this thesis proposes a new announcement information extraction scheme which combines rule method and sequence annotation method.The method mainly includes two parts.The first part is "Sentence level" extraction,using the rule method,reduces the "text level" extraction to "sentence level" extraction in the way of "title positioning content".Firstly,the text structure tree is extracted from the announcement text and stored in a certain format;secondly,a rule label system is designed to constrain the formulation of rule template;finally,a rule logic operation extraction engine is written to extract the required sentence set from the announcement text by parsing the rule template.The second part is "Field-level" extraction,using sequence annotation method,training a joint model of sequence annotation based on two-way gating loop network and attention mechanism to extract field information from sentence set.Firstly,glove word vector tool is used to map word sequence to low-dimensional real vector;then,context semantic information of text is obtained through bi-directional gated loop network;secondly,the weight distribution of entities is obtained by fusing the attention layer of the related entity matrix,so that the semantic information between the "text level" information and the pairs of related entities can be effectively learned;finally,conditional random field is used to acquire the semantic information between the two entities.Layer obtains the optimal solution of tag sequence,and gets the final field information through tag parsing.The experiment results show that the average accuracy of the scheme is 93.46%,the average recall rate is 91.52%,and the average F1 value is 92.52%.It proves that the scheme has good feasibility and practicability in the information extraction task of merger and acquisition announcement,and also provides a solution for the information extraction task of free long text.Based on the above scheme,this thesis designs and implements an information extraction system for M&A and reorganization announcements according to the actual needs.The system mainly includes data capture module,"sentence level" extraction module,"field level" extraction module,data storage module and manual interaction module,which can accurately and efficiently realize the structure of announcement information of merger and acquisition reorganization. |