Appliance Of Data Ware Technology In Achieving The Preferential Information Aggregation

Posted on:2015-03-10

Degree:Master

Type:Thesis

Country:China

Candidate:H W Zhou

Full Text:PDF

GTID:2298330422987025

Subject:Signal and Information Processing

Abstract/Summary:

PDF Full Text Request

In an effort to attract more consumers, the merchant releases different preferentialpolicies, which generally come from different sites and spread everywhere of theinternet, so itâ€™s hard to reach for consumers. We can achieve the preferentialinformation aggregation through the information collection and processing. The datawarehouse technology is the most important part in the information processing.This thesis is working for researching and exhibiting the data warehouse byconnecting with the preferential information aggregation. When collecting andconcentrating the internet information, we find that it contains numerous uncertainsituations, so we need to use the ETL system to match the information which comesfrom different resources to make sure that we can finally offer the normalized databased on the POI to the users. We can also offer the decision-making information tothe merchant through the big data analytic technology.The research of data warehouse includes the overall design and dimensionalmodeling. This thesis put the introduction of the source data and ETL system in animportant role, which comprise the preferential information collecting, extracting,matching, cleaning, transforming and loading. The section of data collecting is mainlyto solve the problem of data acquire, such as measures to prevent its information to becrawled, restrict on cookie and set a maximum visitation to the same IP address andso on. The part of data extracting major in timing and incrementing extraction of thediscount data, and use the timestamp to catch the changing data. The part of datamatching major in the POI matching dominated by name, address and supplementedby telephone, latitude and longitude. The algorithm we adopted includes Chineseword segmentation based on dictionaries and statistics, analogous detection,distinguish algorithm of abbreviated and habitual typos in the street and so on. In thepiece of data cleaning we use the results of correlation and matching based on variouspreferential source and POI information to clean the missing, mistake, repeating datain preferential information. The part of data transformation and loading is primarily totake the preferential data into the same format as destinatecd table in the datawarehouse, and then loaded to in the DW. Lastly, this thesis test and verify thefeasibility of this theory and introduces the development of report forms.

Keywords/Search Tags:

Information Aggregation, Data Warehouse, Big Data Analytics, ChineseWord Segmentation, Data Cleaning, Point of Interest

PDF Full Text Request

Related items

1	Research On Data Cleaning Based On Science And Technology Innovation Big Data Public Platform
2	The Data Integrationã€analysis And Utilization For Hosiptal Information Based On The Data Warehouse
3	Research Of Data Cleaning Method Based On Data Warehouse
4	The Research And Application Of Data Preprocessing In XML Data Warehouse
5	The Comprehensive Resource Information Design And Application On The Basis Of Data Warehouse Technology
6	Some Main Technology's Research Of Data Cleaning
7	Research And Application Of Data Cleaning In The Construction Of POI Data Warehouse
8	Key Technologies Of Spatial Data Warehouse And The Its Application On The Urban Criminal Hotspots Analytics
9	Research And Implement Of Data Warehouse ETL Technology
10	The Research And Application Of Data Cleaning Technique