ETL Tool Evaluation: Paid Media

With the ever expanding ETL tool arena, it can become challenging to decide which tool(s) is the right one for your use case. 

In this blog, we’re going to look at the paid media use case and evaluate some ETL tools your team would potentially use to bring in marketing data

Bing, Facebook, Google Ads, Google Analytics, Klaviyo, Snapchat, and Tiktok data sources have been selected for this evaluation. On the ETL side Airbyte, Fivetran, Hevo Data, Stitch, and Supermetrics have also been selected. Features such as data source coverage, pricing, supported destinations, and sync frequency will be compared across the tools. Keep in mind that this is by no means a comprehensive list of ETL tools available that can be used for paid media data ingestion.

Feature Airbyte Fivetran Hevo Stitch Supermetrics

COVERAGE

PRICING

  • $15 / month (1 million rows)
  • Pricing
  • $500 / month (1 million rows)
  • Free Tier - $0 (0.5 million rows)
  • Pricing
  • $239 / month (5 million rows)
  • Free Tier - $0 (1 million rows)
  • Pricing
  • $100 / month (5 million rows)
  • Pricing
  • $239 / month (Google Sheets)
  • Pricing

DESTINATIONS

  • Warehouses
  • Data Lakes
  • Databases
  • Google Sheets
  • CSVs
  • View All
  • Warehouses
  • Google Sheets
  • SFTP
  • Cloud Storage
  • BI Tools
  • View All

FREQUENCY

COVERAGE

One of the features you’re looking for in an ETL product is data source coverage. That is, can they support your existing and future data sources? One can always use multiple ETL tools, however doing so can potentially increase the overall data ingestion costs. It also adds more overhead when it comes to managing your data ecosystem, e.g. maintenance, keeping track of changes, monitoring for data outages, etc.

In this particular instance, while almost all of the ETL tools supported the data sources evaluated, additional scrutiny is still required.  While a product might specify that they support data ingestion for a specific source, one should also consider which fields are pulled in and how the data is pulled in.

Data Fields

Your reporting and analysis is going to depend on what data you are able to pull.  For example Fivetran’s Tiktok connector does not pull in the complete payment event. While there is an open feature request for this metric, as it stands now, one’s ability to run ad performance and revenue analysis from that data would be impacted. The Supermetrics Tiktok connector on the other hand pulls in this field and you can easily run such reports.

Data Format

Another important consideration is the format of the data pulled in. With Klaviyo, Hevo and Fivetran pull tracked events into an Event table. The data can be more granular compared to the Supermetrics or Stitch Klaviyo connectors. This means more flexibility in your reporting and simply more data to work with.  There are however cons to this, the big one being cost. Whenever you have a tool that brings in event data, it is important to know the order of magnitude of the data that will be ingested. Is it 20k  or 20 million events per month? Keep in mind, you can’t necessarily pick and choose which event types are imported in the event stream. The other impact is when it comes to preparing the data and getting it production ready. If you have all your events in one gigantic table, you will need to write queries and do more work to transform the data and generate the clean datasets that you need for reporting.

The reporting that you are able to do with the data is going to vary across each ETL product depending on the fields pulled in and the format of the data.

 

PRICING

Arguably, the most important factor for most people is how much it is going to cost. 

Pricing, and the factors that affect pricing, is going to vary across the tools. Most ETL tools use some form of a row based pricing model, Airbyte being an exception in this instance. For example, with the monthly active rows (MAR) model, you are charged based on how many new rows were added to your destination or how many existing rows were updated within a month. And in some cases, the more MARs you have per month, the cost per row gets cheaper. Other factors that impact how much you will pay also include:

  • The features available, e.g. sync frequency speeds or how many users are allowed on the platform. Most of these tools have different tiers which offer different features.
  • Which destination you are sending your data to. For example, Supermetrics has different pricing for the Google Sheets destination vs warehouse destinations. Currently they are not transparent about warehouse pricing on their website.
  • Which data sources you want to pull data from. This is becoming more common.
    • Hevo has a free tier (up to a million rows). However, only a subset of connectors fall under this tier. The next tier up which while offers access to even more connectors, jumps up to 5 million rows at a much higher cost. A comprehensive list of which connectors fall under the free tier could not be found at this time.
    • Fivetran also has a free tier (up to 0.5 million rows) which does not include database type data sources. However beyond the free tier, pricing is more scalable.
  • Long term contract.
  • and many more.

A number of these tools do have cost estimators to help you determine how much they are going to cost.

One important note I would like to highlight on pricing is expect that the prices, pricing tiers, or even fundamental pricing models may change. Meaning the numbers in the table above may be different than what is currently advertised. Fivetran’s free tier was only just announced inFeb 2023. For companies with small data volumes, this is a game changer.

 

DESTINATIONS

The vast majority of ETL Tools do support data warehouses as a destination. However, a number of them do support additional destinations. If you are currently running your reporting and analytics outside of a data warehouse, and are not in a position yet to transition to the data warehouse architecture, you can still take advantage of these ETL tools to improve your workflow. A common use case is with Google Sheets. Usually the data ingestion process is manual, i.e. running a report on your ad platform, exporting it and appending the data into a Sheet. This process can be laborious, inefficient, and prone to errors. Tools such as Supermetrics and Airbyte will help you streamline that process and a lot of the time this means that you have more up to date data.

SYNC FREQUENCY

How fresh you want your data to be really depends on your business needs. Future needs should also be taken into consideration. More often than not, as your business becomes more data driven, the need to have fresher data or even close to real time data grows. Taking Supermetrics as an example, their Essential, Core, and Super data plans only offer a daily refresh. For hourly refreshes, one would need to upgrade to the Enterprise plan. Selecting a tool that can offer faster sync speeds at the right price point is important especially if you foresee your data organization needing this feature in the near future.

 

ADDITIONAL CONSIDERATIONS

Additional factors to take into consideration when evaluating ETL Tools include:

  • Quality of customer support – This is sometimes an overlooked factor that can heavily impact your data ecosystem. If you pick an open source ETL Tool such as Airbyte or Stitch, while they will generally be cheaper, consider whether the plan you are on gives you access to their support team.  This can also help you figure out your internal SLAs for your data organization because some aspect of your data flow is guaranteed to break; whether it’s at the data source, due to user error, or due to a bug in the ETL Tool. Also keep in mind that some of these tools, such as Airbyte have fantastic slack communities that can help support you.
  • Longevity / Reputation – Some of these ETL companies have been around for years and others are relatively new. If you know a tool is well-established with a reputation for excellence and reliability, that can go a long way.
  • Nicheness – If you are in a specific or niche industry, it can be advantageous working with an ETL company that has a focus in that industry. You will find that they generally have more coverage for relevant data sources, even more esoteric ones, and the quality of the reports you can build are better and more relevant to your industry. For example tools like Supermetrics or even Funnel (not evaluated) have an emphasis on marketing data sources. In the case of the Supermetrics, building a production ready Bing Ads report in Google Sheets took less than 30 minutes and it did not require any intermediate transformations. To put it simply, they have a targeted audience and they know their audience.
 
The final and most important takeaway is to of course TRY the product. Most of these tools offer a trial period, usually for 14 days. Please take advantage of the full trial period so that you can see how the tool functions over a period of time. This gives you an opportunity to further evaluate:
 
  • Approximately how much data will be ingested. This will help you figure out the right plan and estimate costs.
  • How much additional maintenance will be required. That is, did the connector function without erroring out or you needing to go back and fine tune parameters.
  • How competent their support team is and how well they resolve or escalate issues. I always advise to open at least one support ticket during the trial period so that you can evaluate this before you are in production.
 
Have a question? Need help with your implementation? Email: hello@moderndata.co