Font Size: a A A

Study On The Algorithm Of Association Rules In Financial High Frequency Data

Posted on:2009-06-23Degree:MasterType:Thesis
Country:ChinaCandidate:X W LiFull Text:PDF
GTID:2178360242981606Subject:Software engineering
Abstract/Summary:PDF Full Text Request
With the wide using of modern computer technology in the financial trading, the trading data, including stock, exchange rate, options and financial derivation etc., can be recorded real-timely. All the trading process will be stored either by every transaction or by every second which will be the financial high frequency data. The financial high frequency data are massive. The data of every minute will achieve 1000000 magnitudes in 10 years. Since the development of our stock markets is not long and the construction of financial high frequency database is delayed, the study on the high frequency data is just beginning. The establishment of the suitable tools and methods, the structural model and the forecasting model for china stock market becomes more and more important.The data mining technology is an interdisciplinary subject, which is rapidly expanded domestic and overseas in recent years. It involves many domains like the database, statistics, the artificial intelligence, the machine learning and so on. Along with the development of data bank technology and increasing of the data application, the data quantity which the humanity accumulates is growing by the speed of index, the data mining was used the technology of the above subject to carry on processing of great data quantity. The connection analysis is forerunner of the data mining, it may find the relation between different commodities (item) in the transaction database, these rules reflected that the customer purchases the behavior pattern, such as influenced by purchased some commodity. Connection rule of data mining that has successful applications in the field of business causes it to be the most mature, mainly and active research content in the data mining. The stock analysis and forecast are very important application domains in data mining technology, many scholars and companies take up with the research and application that used in the stock analysis and forecast of the data mining technology.The primary contribution of this article is that according to the enormously characteristics of the high-frequency financial data, improved the algorithm of Apriori. Transform the business database into the char table, which element value is"+"or"-". When compute k- item set support, we only need to carries on"and"operation on the arbitrary k column vectors in the table, the number in the operation with"+"and with"-"is the k-item set support. Afterward the element value of business which is not"+"or"-"at the same time in the k- item set is called"0". Delete business which only contains a single k-item set, because this business doesn't contain a (k+1)-item set which is k-item set linked by myself. Make such processing to the data can reduce the operand, and save the operation time, during the process of scanning database to computation support.The article begins from the aspect of both fundamental and technical of the stock analysis, using the concept and the method of the data mining technology, to research finance high frequency data question by Apriori algorithm of connection rule, mainly applied in stock data. Because certain stock price will have similar or the opposite tendency during certain amount of time, by mining the connection rule between these data or the stock through the data mining technology, the investors will know the trend of each kind of stock and the relations between the stocks, then make correctly investment decision.The first part mainly involves the background knowledge of the stock, it gives an introduced of fundamental analysis and the technical analysis of the stock in detail, meanwhile make simple expatiation to the stock price index and the stock market forecast variable, as well as present situation and the direction of development of the security analysis software. The second part makes a comprehensive analyze to the financial high frequency data, analyzes the main idea of the financial high frequency data, and the merit of carrying on analysis using the stock high frequency data. The high frequency data are in the date and the date data, are targeted mainly to hours, minutes or seconds for the frequency of data collection, by analyzing of the high frequency data, we could find the real reason why price fluctuate.The third part mainly describes the basic concept and the classification of the connection rule in the data mining technology, principle and the process of the Apriori algorithm in connection rule. Using Apriori algorithm excavation frequent item set process, create connection rule through frequent item set.In the fourth part, based on characteristic of high frequency data's, we filter the stock data to obtain the sampled data. Here, we collected data related to the stock price, including price data of several typical stocks in five minutes, 15 minutes, 30 minutes, 60 minutes and the daily. Regarding the financial high frequency data, the confidence that was unearthed must be relatively high, because of the hugeness of the financial high frequency data quantity, and the algorithm of Apriori may possibly produce the massive candidate collections during the computation, also may need to sweep database repeatedly, this causes to much cost of time during the operation process. So, we need to complete the pruning process processing in the iterative process, follows the principle that"an item set is the frequent item set works, if and only if its all subsets are the frequent item set". In view of the characteristic above, in the process of the data pretreatment, we make some improvements to the Apriori algorithm. In a pure business database, to coordinate with the operation that search the frequent item set, turns the business database into the char table. In order to enhance the efficiency of the algorithm, causes the element value of the table to be"+"or"-", when compute k- item set support, we carry on"and"operation on the arbitrary k column vectors, which is in the table, in the operation with"+"or with"-"the number is the k- item set support. Delete business which only contains a single k- item set. Make such processing to the data, could reduce computer's operand, when calculate the support of subset. Afterward, we use the date line price as the demonstration model, simply narrated the process of production of these stock frequent item set, finally create connection rule according to the frequent item set that obtains.The fifth part performs the analysis of the stock high frequency data through the Apriori algorithm, the process of the discovery of the frequent item set, produced the connection rule between these stocks finally in detail. Applying the idea of Apriori algorithm, we uses the transaction date of the stock as the result of the pretreatment, takes the column number of the transaction date as the direct processing object of the algorithm, and takes the transaction date as its indirect processing object, then we can quickly and expediently unearths the rule that the user to be interested in. We use the data of different frequency to carry on the excavation, draws the following conclusion: With the use of different data, there are some differences in the stock movements. This indicated that because high frequency data record the data of the stock that collected by the minute frequency, the data frequency is higher, the information that loses are fewer.The high frequency data has contained more real-time information in the securities trading process, caught each tiny change process that has occurred in the stock market more accurately, therefore using high frequency data to research the stock price is superior than using low frequency data. To the investor, they pay attention to the short-term trend of the stock to discover the best opportunity of taking or selling. Because the certain financial high frequency data will have similar or the opposite tendency during certain amount of time, therefore through the data mining technology to excavate finance high frequency data, will be propitious to the investor to know the trend of each kind of stock and the relations between stock, will further analyze the policy and the plan of the companies that have come into the market, will thus make the correct investment decision.
Keywords/Search Tags:Association rules, Apriori algorithm, high-frequency data, stock
PDF Full Text Request
Related items