Font Size: a A A

Appliance Of Data Ware Technology In Achieving The Preferential Information Aggregation

Posted on:2015-03-10Degree:MasterType:Thesis
Country:ChinaCandidate:H W ZhouFull Text:PDF
GTID:2298330422987025Subject:Signal and Information Processing
Abstract/Summary:PDF Full Text Request
In an effort to attract more consumers, the merchant releases different preferentialpolicies, which generally come from different sites and spread everywhere of theinternet, so it’s hard to reach for consumers. We can achieve the preferentialinformation aggregation through the information collection and processing. The datawarehouse technology is the most important part in the information processing.This thesis is working for researching and exhibiting the data warehouse byconnecting with the preferential information aggregation. When collecting andconcentrating the internet information, we find that it contains numerous uncertainsituations, so we need to use the ETL system to match the information which comesfrom different resources to make sure that we can finally offer the normalized databased on the POI to the users. We can also offer the decision-making information tothe merchant through the big data analytic technology.The research of data warehouse includes the overall design and dimensionalmodeling. This thesis put the introduction of the source data and ETL system in animportant role, which comprise the preferential information collecting, extracting,matching, cleaning, transforming and loading. The section of data collecting is mainlyto solve the problem of data acquire, such as measures to prevent its information to becrawled, restrict on cookie and set a maximum visitation to the same IP address andso on. The part of data extracting major in timing and incrementing extraction of thediscount data, and use the timestamp to catch the changing data. The part of datamatching major in the POI matching dominated by name, address and supplementedby telephone, latitude and longitude. The algorithm we adopted includes Chineseword segmentation based on dictionaries and statistics, analogous detection,distinguish algorithm of abbreviated and habitual typos in the street and so on. In thepiece of data cleaning we use the results of correlation and matching based on variouspreferential source and POI information to clean the missing, mistake, repeating datain preferential information. The part of data transformation and loading is primarily totake the preferential data into the same format as destinatecd table in the datawarehouse, and then loaded to in the DW. Lastly, this thesis test and verify thefeasibility of this theory and introduces the development of report forms.
Keywords/Search Tags:Information Aggregation, Data Warehouse, Big Data Analytics, ChineseWord Segmentation, Data Cleaning, Point of Interest
PDF Full Text Request
Related items