Font Size: a A A

Research On Macro Index Predication Based On Search Data

Posted on:2017-01-19Degree:MasterType:Thesis
Country:ChinaCandidate:G M LiFull Text:PDF
GTID:2348330488959952Subject:Software engineering
Abstract/Summary:
The rapid development of mobile Internet ushers the era of data explosion, search service providers therefore have accumulated vast amounts of search data, which reflects social hot, and coincides with macro indexes. Such that, studying a method based on search data to predict macro indexes becomes possible, and it will get large scientific and practical value.The GFT (Google Flue Trends) model based on Google’s search data, has made important contributions to worldwide influenza forecast, which confirms the value of search data, and blooms multiple variants of GFT model. As the largest search provider in China, Baidu logs the most comprehensive users’ search behavior data, which provides basis for BS-MIP (Macro Index Predication based on Baidu Searches) model presented in this paper. BS-MIP is capable of predicating macro indexes automatically, and avoiding GFT’s strong dependency to professional domain knowledge.In order to avoid inaccuracy because of missing features, earlier models introduce as many features as possible, which risks model’s failure due to feature redundancy. This paper presents a feature selection module with the core of GA-Lasso (Genetic & Adaptive Lasso) method, which combines traditional feature selection method and the idea of artificial intelligence, and provides a practical scheme of solving over fitting problem and high dimensional and small sample size problem.Discrete processing of continuous data becomes an important part of data preprocessing, and benefits to the process of model learning. However, due to the lack of label information, the discrete problem in unsupervised situations become a challenge needs to be solved. Based on ensemble thinking, this paper presents a discrete processing module with the fore of KED (Kmeans based Ensembling Discretization) method, and provides the ability of efficient discrete processing.BS-MIP model integrates search data to nowcast macro indexes, providing an instance for similar work in related fields. GA-Lasso method and KED method assure BS-MIP’s availability and scalability respectively. As well, GA-Lasso and KED method have strong flexibility as they can be used as separate modules.
Keywords/Search Tags:Search Data, Macro Indexes Predication, Feature Selection, Discrete Processing
Related items