Font Size: a A A

Study On Increment Algorithm Of Association Rules In Financial High Frequency Data

Posted on:2011-06-22Degree:MasterType:Thesis
Country:ChinaCandidate:L ChenFull Text:PDF
GTID:2178360305454837Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
In recent years, as the widely adaption of modern computer technology and cost reduction for data record and memory, people carry on the deeply research for the mass data. Especially, as continual improvement of computer algorithm and wide usage of huge database in the financial field, people pay attention to Financial High Frequency Data, which includes exchange datum in real time of stock, exchange, futures and ramifications derived from finance. Financial High Frequency Data is of great capacity, such as in exchange market, a minute data can be 1000000 in 10 years.Compared with the traditional low-frequency data, the financial high-frequency data collected in a higher frequency contains more information available for mining, and some unique features have been found, which in turn induce academics and financial practitioners in the financial High-frequency data analysis generating more interest. China's financial high frequency data base establish late at this stage, an in-depth research scholars and professionals are not much more and obtain useful results. As China's financial market is developing day by day, the establishment of appropriate financial high frequency data model and found that highly efficient methods of analysis is imminent. This article is from the stock market to collect and analyze a large number of high-frequency data.Data mining technology is developed rapidly in recent years at home and abroad in a cross-disciplinary, involving artificial intelligence and machine learning, database and statistics and other fields. With the rapid development of information technology and the popularization of databases and data applications, the amount of data accumulated in the human being grow in rate of index, prone to tb terms. How to extract the data from a mass of useful knowledge has become a priority. Data Mining is a kind of data processing technique developed to comply with this need.It requires from a large number of uncomplete, random, vague, noisy data implicit in the work, that people do not know in advance, but is potentially useful information and knowledge. Main tasks of data mining are association analysis, cluster analysis, classification, forecasting, time series models and error analysis.Correlation analysis is the pioneer of data mining, which is first applied to shopping cart problems. Main use from commodity trading in the database, is to find the link between different commodities, these links reflect the customer purchase behavior patterns. At present, the association rules of data mining has been successfully used in commercial, financial and telecommunications sectors, which has become the most active data mining and important research contents.With China's prosperity and development of the stock market in recent years, many scholars and stock investors are actively engaged in data mining techniques in the stock analysis and prediction of research and application. This article is through association rule mining techniques, to find that stock data existing in a number of potential information, in particular the linkages between the different relationships in the stock in order to provide guidance on investment decisions.At present, many association rule mining techniques are based on static transaction database. However, as the association rule mining widely used in different fields, static data mining is no longer to satisfy people's demands, so association rules algorithm for real-time incremental updates. However, the traditional incremental update algorithms are based on Apriori algorithm, which cannot prevent the frequent scanning of the entire transaction database and for large databases in terms of financial high-frequency is a bottleneck. It has seriously affected the efficiency of the excavation. Therefore, this article has improved incremental updating algorithm of association rules that allow greatly improved efficiency in mining.Prior to the improved algorithm, first high-frequency data on the financial model was described and improved. In this paper, the Shenzhen and Shanghai stock market's 5-minute stock data are adopted and in the medicine section the more representative of the eight stocks simulating is collected in the same time. As trade continuing, every 5 minutes, in the original database a new transaction data is inserted. In order to facilitate to judge in the first 5 minutes that the stock relative is up or down, the need for 5-minute transaction data is preprocessing. First, extract the data in the 5-minute price data, and then compare the current price data for the previous five minutes. If the data are more than 5 minutes, then characters U, or else with the character D representation;Finally, in each record of the last eight stocks Change, this identity constitutes a sequence string, such as the string DDUUDUUU; each one represents the identity of the Change of the stock. Advantage of such a treatment is: on the one hand, scan the database for the stock rose or fell at the same time, the need to judge Change again by calculating the situation can be directly completed through the matching; the other hand, every 5 minutes to the original database, insert data, simply plug one on this 8 Change of the records of the stocks, that is the size of each insert data 1,so each of the items inserted in the data set are frequent item sets. This improved the financial high frequency data model, not only facilitate the database search, but also is convenient to use the algorithm to the pattern matching.In traditional incremental update algorithm, it is mainly based on the classic Apriori algorithm, so it needs to constantly scan the whole database that inherent weaknesses became evident: When each new additional data items in the collection are the frequent item sets, if there is a project in the original database which is not a frequent item set, then it must scan the entire database currently to determine whether it is a frequent item set. Therefore, the traditional frequent FUP algorithm scans the original database, thus inevitable. To this end, this paper presents the second choice and options for frequent item sets of these two collections to temporarily store some of the projects set, These projects set the current is not frequent, but a certain size in the new data may become a frequent item sets.Of course, because it is not allowed to storage unlimited non-frequent itemsets, the need to re-scan the database for the frequency is of discussion. This collection of five minutes for the characteristics of the data, according to 48 5-minute data in the day, is the data set for each additional 48 global to scan the entire database. By setting the second best candidate frequent itemsets, we not only reduce the size of frequent itemsets, but also significantly reduce the number of scanning the whole database, thus greatly improve the efficiency of the excavation. Digging out the frequent item sets can be useful to extract association rules, which is conducive to a variety of stocks investors to understand the relationship between the trend and stock in order to make correct investment decisions.This first part is mainly related to background knowledge of the stock, financial high-frequency data and data mining, paving the way for the study of this paper.The second part starts with the concept of association rules and then highlights the classic Apriori algorithm and the traditional incremental update algorithm (FUP algorithm), and points out its shortcomings in detail.The third section presents two concepts as the second choice and options for frequent item sets, on the basis study the improvement of the incremental algorithm, and makes a detailed description.In the fourth part, the stock 5-minute data on the use of the algorithm to simulate, and excavation results are analyzed and evaluated. It can be seen that the results from the improved algorithm is efficient.The last part of the article shows a summary and outlook.
Keywords/Search Tags:Association Rule, Financial High Frequency Data, the Algorithm of Increment Association Rules
PDF Full Text Request
Related items