Font Size: a A A

Research On Data Annotation Of Bank Transaction Short Message Information Based On CRF Model

Posted on:2019-05-01Degree:MasterType:Thesis
Country:ChinaCandidate:D Q GuoFull Text:PDF
GTID:2358330542964347Subject:Quantitative Economics
Abstract/Summary:PDF Full Text Request
In recent years,mobile Internet and mobile electronic devices have developed rapidly.People have broken the obstacles of space and can realize the communication interaction between people and people and organizations through mobile devices anytime and anywhere.In 2016,the Pew Research Center pointed out that China's mobile phone penetration rate exceeds 96/100 units,and China's smart phone penetration rate exceeds 63%,while Chinese mobile Internet users account for more than 90% of Internet users.With the development of the mobile Internet,the short message has been transformed from the original function of realizing people-to-person communication to the current interaction between people and institutions.Users of the organization now be reminded the business by sending text messages to users,such as bank transaction messages,ticket information,and call reminders.However,the short message data exists in the device in the form of natural language and cannot be directly used for data analysis and research applications.Therefore,it is necessary to extract the useful information in the short message to form structured data so that the data can be analyzed,the information data can be excavated and utilized,and the user experience can be improved.SMS data involves all walks of life,and SMS in each industry has its own key information,professional language and text format.Among them,the financial information of SMS,regardless of the value of the information or the relative amount of information,has a huge excavation value compared with the SMS of other industries.This article mainly deals with the short message processing and analysis of the financial industry,and applies natural language processing technology to convert the unstructured text data sent by the bank into structured data that can be statistically analyzed.Based on subsequent data analysis requirements,the extracted information includes key information in the SMS such as account,transaction amount,account balance,and transaction type.On the issue of SMS message extraction,the task of extracting transaction key information is transformed into the task of sequence labeling that identifies the key information in the text—named entity recognition.In the field of Chinese natural language processing,most of the current researches are based on Weibo,People's Daily and other public corpus training data,and the naming example recognition is also to identify the single information in institutions such as organization,location and emotional words.With a certain type of event as the theme,there is little research to identify all relevant information about an event.In order to reduce the complexity of naming instances to identify short message information,a character segmentation system based on short message corpora and named instances of financial transactions was first established.Words or phrases related to naming instances were used as the criteria for segmentation,and segmentation results were used as one of the features extracted from examples.Based on the syntactic analysis of financial transaction messages,combined with the external dictionary rhetorical features,the experiment proves that this method can effectively improve the accuracy of the financial transaction message instance extraction and reduce the complexity of identification information identified by the named instances.More effective.Finally,for the problem of bank transaction data structure,an analysis method and system with strong extensibility and versatility are proposed.The experimental results show that the named case recognition system for bank transaction text messages has achieved good results.The accuracy of phrase and part-of-speech tagging of the word segmentation system reaches 0.987,the recall rate reaches 0.988,and the F1-score reaches 0.987.The accuracy of the named case recognition system was 0.96,the recall rate was as high as 0.977,and the F1-score value was 0.969.This article has laid a solid theoretical and applied foundation for the event-centered,natural language processing task of extracting relevant information about events.
Keywords/Search Tags:Bank transaction SMS, participle, part of speech tagging, named instance identification, condition random field
PDF Full Text Request
Related items