Font Size: a A A

Research On Market Data Extraction And Forecast On The Web

Posted on:2008-07-04Degree:MasterType:Thesis
Country:ChinaCandidate:C Y YuFull Text:PDF
GTID:2178360242460752Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
With the rapid development of Web technology, the World Wide Web has become the most comprehensive information resource, and Web mining has become a hot topic in data mining. The Web contains large amounts of market data in dynamic tables. The extraction and prediction of such dynamic Web data have both theoretic significance and practical applications. Market data extraction and prediction on the Web is studied in this thesis. The main context is as follows:(1) A market data extraction algorithm for the Web and a meta-data extraction algorithm from Web pages are proposed. They both make great use of the syntax in HTML and the design principles of Web pages. Taking into account the common practice that "market data are usually displayed in the largest table on a Web page", the market data extraction algorithm first detects the largest table on a Web page and then transfers it into a DOM tree, and in the end gets the node values of the tree. This algorithm is different from traditional ones in that it can automatically extract market data and does not need a data extraction region to be specified by the users. To describe the extracted page, a meta-data description model is designed for the meta-data extraction algorithm. The meta-data extraction algorithm fully explores Web page structures, and performs efficient extraction by using regular expressions. Experimental results demonstrate that both algorithms exhibit a satisfactory efficiency.(2) A study on Web market data prediction is conducted. Market data prediction can be divided into long-term prediction (longer than one year) and short-term prediction (shorter than one year). After collecting agriculture product price data using the above Web data extraction algorithm, different time series prediction models and different sample data are applied for both long-term and short-term predictions, in order to check how the prediction performance is influenced by sample data and prediction models. Experimental results show that linear model with seasonal changes on long term data performs better for long term prediction and the Holter-Winter model with seasonal changes on short term data is better for short term prediction.(3) A prototype system for agriculture product price prediction is designed and developed. The system extracts market price data from a given website everyday automatically, displays the chosen market data in plots, and chooses a prediction model to perform prediction based on a user-specified prediction interval.
Keywords/Search Tags:Web mining, Web text mining, Web market data extraction, time serial forecast
PDF Full Text Request
Related items