While each data pipeline is different, a blueprint with typical and preferred requirements are provided here for Data Engineering teams.
Processing architecture
Our exposure log pipelines use batch processing, executed on a daily, weekly, or bi-weekly cadence, depending on the circumstances.
Exchange methods
The most common and preferred method is a file-based data exchange, where the exposure logs are placed in a data storage (AWS S3 bucket or Google Cloud GCS Bucket). Then Adara’s IAM principals are granted access to the storage location.
Alternatively, we can receive data through a LiveRamp feed.
If you prefer data exchange via a clean room, database, or API connection, we may be able to accommodate these after an evaluation of the technical details of the connection. In these cases, we request that a sandbox/dev environment be provided for the purposes of evaluation.
Log Type
We only need the exposure/impression logs for measurement. Other log types, such as clicks or views, are not required.
Delivery Cadence
We prefer that exposure data is
Time Zone
We have no timezone preference, but we expect that dates, times, and timestamps explicitly include timezone designation (preferred) OR consistently follow the same time zone.
File Format
The preferred formats are: Compressed CSV and Apache Parquet.
CSV file specifications | |
---|---|
Column separator | comma , |
Quotation | double-quote " |
First row | contains column names |
Encoding | UTF-8 |
Compression | gzip |
File extension | .csv.gz |
Folder Structure
For efficient processing, we recommend that the folders and files be organized into a structure using the advertiser id and the date of impression. Please do not use the date of export for the folder structure!
For example:
File Names
We recommend to standardize the file names by following a naming convention:
[advertiser_id]_[timestamp_start]__[timestamp_end]_[file_version_id].[extension]
advertiser_id | the name or ID of the advertiser |
timestamp_start | the start of the period when the exposures occurred, 00:00:00 midnight (if daily files are used) in a yyyyMMddhhmmss format |
timestamp_end | the end of the period when the exposures occurred, 23:59:59 (if daily files are used) in a yyyyMMddhhmmss format |
file_version_id | a unique identifier of the file version, so that if a file needs to be exported again, the versions can be identified. Can be a unique hash, timestamp of the export etc. |
extension | Please use .csv (uncompressed CSV), .csv.gz (gzip compressed CSV) or .parquet (Apache Parquet) |
For example: sample_advertiser_345_20240810000000_2040810235959_0001.csv.gz
Columns
At a minimum, the log data must contain:
The user ID must be one of the following:
If your data has other ID types, a technical evaluation is required before acceptance. |
While not required, we strongly encourage to include additional columns for:
Additional columns may be included in the data. |
Use the following CSV sample as a template for your data |
xxxxxxxxxx
impression_id,impression_ts,advertiser_id,campaign_id,creative_id,adgroup_id,user_id,user_id_type,country,browser_type
8172635,2024-06-01 17:42:23 UTC,7567968,8856778,67986,34567,ow8ed7hw723e67,HEM,United States,Chrome
7832378,2024-06-02 11:20:11 UTC,7567968,8856778,67986,34567,kd3203ju2j39dj,HEM,United States,Edge
Metadata
Our customers often request that we use the names of campaigns, orders, and creatives in our reports, instead of IDs. To do this, we need metadata included in your data feed. Please let us know if you can provide this data as part of your data feed.
Data Dictionary
Our engineers will appreciate written documentation for your data feed; a data dictionary or ERD diagram will greatly expedite the pre-approval process.
Advertiser Onboarding
Please provide step-by-step instructions that we can share with our Advertisers about how they can request a data feed when they want their impression logs to be delivered to Adara. Information to include where to submit a request, what information to include and what is the typical time to process the request.
Technical Support
We ask that you share a designated channel where we can request technical support with your data feed.