With the development of computers and the increasing popularity of the Internet,the emergence of massive texts have greatly enriched our sources of information.Especially in the financial field,the emergence of unstructured texts which include a large number of financial news,company announcements,industry research and so on enriches the information sources of investment banking and brokerage..However,it takes a lot of time of professionals to read and parse these texts.The ugrent problem that the datas we need are automatically extracted from massive free texts should be solved at once.Information extraction technology has become an important research area of natural language processing.Event extraction is an important research direction in the field of information extraction.Event extraction can extract the user interested event information from unstructured texts,so that the events of natural language can be extracted in a structured form.Therefore,this paper puts forward the event extraction technology based on the financial field.Firstly,the web crawler system is developed through the Python language based on free text extraction algorithm.And the system is used to crawl and analysis financial website data.Next,getting on event extraction in financial field based on the acquired text data,the AC automatic machine,pattern matching and other methods.The main contents of this paper are as follows:(1)Automatically extract the news text in the financial website based on the clustering technology.(2)We get event sentence seed trigger words through the syntactic dependence to identify the subject-predicate relationship and the relationship between verb and object.(3)We use news corpus and open source "knot word" tool to separate the news word and use Word2 vec training to generate word vector model.By using the similarity of word vectors,the key words clustering is obtained.The triggering words of different event categories and the speed of triggering words in this paper are greatly improved;Meanwhile,the work of constructing the dictionary is reduced.(4)In order to solve the problem of high proportion of non-event sentences in event extraction,this paper realizes the pre-classification of events by triggering words through AC automata algorithm.Then the decision tree algorithm is used to further classify the candidate event sentences,and it improved the efficiency of event classification.(5)In order to identify and extract event elements through pattern recognition,this paper proposes three kinds of solution strategies including entity recognition method,entity generalization method and entity structure method for different event types.This article builds company dictionary,government department dictionary,job dictionary and other related dictionary to improve the word segmentation effect.And these dictionaries were successfully applied to the word breaker.For the named entity recognition technology,this paper first prepares through the LTP technology of HIT Social Computing and Information Retrieval Research Center and realizes the recall rate of the named entity recognition by secondary recognition through the entity dictionary.Through the further analysis of the experimental results,the accuracy and recall rate of the event extraction method proposed in this paper have reached the high level of the filed.Thus,we verify the validity and feasibility of the proposed method.The event extraction technology proposed in this paper has been successfully applied to the event-driven module and event tracking module of "Sniffing Taurus Financial Platform" and has been well received by the filed. |