Technical Specifications for Sharing Impression Logs
While each data pipeline is different, a blueprint with typical and preferred requirements are provided here for Data Engineering teams.
Data Exchange Methods
Processing architecture
Our exposure log pipelines use batch processing, executed on a daily, weekly, or bi-weekly cadence, depending on the circumstances.
Exchange methods
The most common and preferred method is a file-based data exchange, where the exposure logs are placed in a data storage (AWS S3 bucket or Google Cloud GCS Bucket). Then Adara’s IAM principals are granted access to the storage location.
Alternatively, we can receive data through a LiveRamp feed.
If you prefer data exchange via a clean room, database, or API connection, we may be able to accommodate these after an evaluation of the technical details of the connection. In these cases, we request that a sandbox/dev environment be provided for the purposes of evaluation.
Data Format
Log Type
We only need the exposure/impression logs for measurement. Other log types, such as clicks or views, are not required.
Delivery Cadence
We prefer that exposure data is
- Provided on a daily basis, for each calendar day in a given time zone (12:00:00 AM - 11:59:59 PM period)
- Delivered not more than 24 hours after the day - on which the impression occurred - ended
Time Zone
We have no timezone preference, but we expect that dates, times, and timestamps explicitly include timezone designation (preferred) OR consistently follow the same time zone.
File Format
The preferred formats are: Compressed CSV and Apache Parquet.
CSV file specifications | |
---|---|
Column separator | comma , |
Quotation | double-quote " |
First row | contains column names |
Encoding | UTF-8 |
Compression | gzip |
File extension | .csv.gz |
Parquet file recommended configuration | |
---|---|
https://parquet.apache.org/docs/file-format/configurations/ |
Folder and File Structure
Folder Structure
For efficient processing, we recommend that the folders and files be organized into a structure using the advertiser id and the date of impression. Please do not use the date of export for the folder structure!
For example:
File Names
We recommend to standardize the file names by following a naming convention:
[advertiser_id]_[timestamp_start]__[timestamp_end]_[file_version_id].[extension]
advertiser_id | the name or ID of the advertiser |
timestamp_start | the start of the period when the exposures occurred, 00:00:00 midnight (if daily files are used) in a yyyyMMddhhmmss format |
timestamp_end | the end of the period when the exposures occurred, 23:59:59 (if daily files are used) in a yyyyMMddhhmmss format |
file_version_id | a unique identifier of the file version, so that if a file needs to be exported again, the versions can be identified. Can be a unique hash, timestamp of the export etc. |
extension | Please use .csv (uncompressed CSV), .csv.gz (gzip compressed CSV) or .parquet (Apache Parquet) |
For example: sample_advertiser_345_20240810000000_2040810235959_0001.csv.gz
Data Content
Columns
At a minimum, the log data must contain:
- Unique impression ID
- Impression timestamp
- Advertiser ID
- Campaign ID
- User ID
The user ID must be one of the following:
- IP v4
- Hashed Email Address (HEM) - (Sha256 method)
- Adara Cookie ID
- TTD UUID2.0
- Mobile ID (MAID) - IDFA (Apple), AAID (Android)
If your data has other ID types, a technical evaluation is required before acceptance.
While not required, we strongly encourage to include additional columns for:
- Creative ID
- Order ID
- AdGroup ID
- Line Item ID
- Device information
- Browser information
- User geo information
Additional columns may be included in the data.
Use the following CSV sample as a template for your data
impression_id,impression_ts,advertiser_id,campaign_id,creative_id,adgroup_id,user_id,user_id_type,country,browser_type
8172635,2024-06-01 17:42:23 UTC,7567968,8856778,67986,34567,ow8ed7hw723e67,HEM,United States,Chrome
7832378,2024-06-02 11:20:11 UTC,7567968,8856778,67986,34567,kd3203ju2j39dj,HEM,United States,Edge
Metadata
Our customers often request that we use the names of campaigns, orders, and creatives in our reports, instead of IDs. To do this, we need metadata included in your data feed. Please let us know if you can provide this data as part of your data feed.
Advertiser Onboarding & Support
Data Dictionary
Our engineers will appreciate written documentation for your data feed; a data dictionary or ERD diagram will greatly expedite the pre-approval process.
Advertiser Onboarding
Please provide step-by-step instructions that we can share with our Advertisers about how they can request a data feed when they want their impression logs to be delivered to Adara. Information to include where to submit a request, what information to include and what is the typical time to process the request.
Technical Support
We ask that you share a designated channel where we can request technical support with your data feed.