[0:00]Hello, everyone. Thanks for coming. I'm going to kick us off and then I'm going to pass over to Luke. For this session, we're going to talk about understanding the real cost of your data warehouse, something that we're talking about more and more at Fivetran and with our customers. What we're seeing is that data warehouse costs are really starting to spiral. They're a lot higher than our customers expect and a lot higher than they'd like them to be. And the problem is that it's actually really hard to diagnose this problem, to understand what's actually driving the cost, where the costs are coming from. So we're going to talk through some of the reasons for that and what you can do about it. So first off, let's talk about the modern data stack. I'm sure for a lot of you, this is going to be familiar, but for those of you who it's not, I'm just going to quickly talk through what we mean by the modern data stack. So really, it's a series of tools that allow you to bring data into your data warehouse, transform it, and then analyze it so you can make decisions on top of that data. So typically, that means a series of different tools. At the beginning, you have your connectors that bring your data from wherever it lives, whether that's your CRM, your ERP, any kind of SAS tool that you're using. And it brings it into your data warehouse, which is essentially a big database in the cloud. Then you've got your transformation tool, your BI tool, and then your reverse ETL tool. So this is what we mean when we talk about the modern data stack. And these tools have really changed the game in terms of data. They've really sped up the process, made it a lot easier for you to use your data to make decisions. But at the same time, we're seeing that they're really driving up the cost of your data warehouse. And that's because your data warehouse is at the center of this stack. So every time you're using your connector, your transformation tool, your BI tool, your reverse ETL tool, you're actually paying for compute in your data warehouse. And the big reason for this is that it's actually really hard to understand how the cost is broken down. And that's because all of these tools have actually really different pricing models. And they're all optimized for very different things. So for example, at Fivetran, we optimize for uptime and reliability.
[2:12:00]So you can set up your connectors and you can forget about them.
[2:15:00]You don't have to worry about the data not arriving in your data warehouse.
[2:19:00]But what that means is that we run a lot of queries in your data warehouse to check for changes in your source system, to make sure that the data is arriving into your data warehouse accurately and consistently.
[2:31:00]At DBT, they're optimizing for speed and for flexibility.
[2:37:00]And so again, they're going to be querying your data warehouse a lot, loading a lot of data into your data warehouse, making changes to it really quickly.
[2:44:00]So again, that's going to be driving up compute costs.
[2:47:00]And all of these tools are doing the same thing.
[2:49:00]So you've got all of these different tools that are optimized for different things, that all have different pricing models.
[2:55:00]And at the center of it all is your data warehouse.
[2:58:00]And your data warehouse is charging for compute and for storage, both of which are being driven up by all of these tools.
[3:05:00]And so what we're seeing is that your data warehouse costs are really the biggest cost in your data stack.
[3:11:00]They're normally 50 to 70% of your total data stack spend.
[3:16:00]And that's why it's so important that you really understand what's driving those costs and where they're coming from.
[3:22:00]But as I said, it's really hard to unpack this because you have different teams using different tools at different times, all of which are querying your data warehouse.
[3:32:00]So it's really hard to understand how to get visibility into what's actually going on.
[3:37:00]So now I'm going to pass over to Luke, who's going to talk to you about how you can actually optimize these costs.
[3:43:00]Cheers. So you'll have spent a lot of time and money building out your modern data stack.
[3:49:00]And you'll have hired really intelligent people that are domain experts in your business to make sure that they're deriving value from the data, but then they'll also be spending a lot of time and a lot of money and a lot of effort actually looking at data warehouse costs and trying to optimize them and reduce them.
[4:06:00]So what if you could redirect all of that really intelligent effort into actually deriving value from the data, rather than trying to save money on compute costs within your data warehouse?
[4:18:00]And ultimately, what we think at Fivetran is that you should be trying to get the most value for money that you can from your data warehouse.
[4:25:00]So we're going to dive into some of the issues that we see, and then some of the ways that you can fix them.
[4:30:00]So typically, what we see is that there are multiple different teams that are interacting with the data warehouse.
[4:36:00]You'll have your data engineering team, your analytics engineering team, your data science, and your data analyst teams, all interacting with a centralized resource.
[4:45:00]And these different teams are also probably using different tools, different methodologies, and they'll all be running queries.
[4:52:00]So understanding exactly who's running what and where and when is a real challenge for a lot of data teams.
[4:59:00]And ultimately, this means that you end up with a high data warehouse spend because you don't have that granular visibility that you need to optimize it.
[5:08:00]So ultimately, you need a solution that is going to help you attribute the cost to the right team, to the right user, to the right tool and to the right query.
[5:17:00]So let's walk through an example.
[5:19:00]This could be a daily extract for a marketing campaign.
[5:23:00]So your marketing team, they're typically going to use the data warehouse to understand what's going on in the business.
[5:29:00]And they'll use tools like Reverse ETL to export that data back into their marketing campaign tools like Braze.
[5:37:00]Now, the issue here is that your Reverse ETL tool might actually be creating a new copy of the data.
[5:43:00]So you might be actually doubling up on your data warehouse spend.
[5:47:00]And it can also generate a lot of unneeded compute.
[5:50:00]Because they're optimizing for moving that data as quickly as possible.
[5:55:00]So what you need is visibility on who's running what.
[5:59:00]So here we have an example of a Fivetran dashboard where you can see the spend by service, by team, by user, by query type, and by query ID.
[6:09:00]So with this visibility, you're able to really hone in on what's driving the cost in your data warehouse, and then to optimize them accordingly.
[6:18:00]So you're able to see which teams are spending the most on compute, which tools are spending the most on compute, and which individual queries might actually be misfiring, or might actually be taking up a lot more resource than you expect.
[6:32:00]And then it's about making sure that you have the right solution that's going to automate that data warehouse cost optimization for you.
[6:40:00]So what Fivetran can actually do is reduce your queries by a factor of 10.
[6:45:00]And that can ultimately reduce your data warehouse costs by 10 to 30%.
[6:50:00]So how are we actually doing that?
[6:52:00]Ultimately, it's a series of different features that we have within the Fivetran product.
[6:57:00]First, we have an active schema monitoring.
[7:00:00]So we're continuously monitoring your source data to make sure that we're only actually querying what needs to be changed and what needs to be updated.
[7:08:00]We're not sending unneeded queries to your data warehouse.
[7:11:00]Second, we have what we call a native type mapping.
[7:14:00]So we're making sure that we're only writing the most efficient queries into your data warehouse so that you're saving on compute every single time a Fivetran connector actually runs.
[7:25:00]We also have automated table restructuring.
[7:28:00]And what that means is that Fivetran is actually managing that schema as it changes.
[7:33:00]So you don't actually have to have your data engineers doing that for you.
[7:36:00]And then finally, we have what we call warehouse load optimization.
[7:40:00]So what this means is that Fivetran is actually trying to bundle up all of the data into as few queries as possible to try and minimize the amount of compute that you're running.
[7:49:00]Ultimately, Fivetran is providing you with that visibility to understand exactly what's driving your data warehouse costs, and then also automating the optimization of those costs.
[7:59:00]So your data teams can actually focus on deriving value from data rather than trying to save money on compute costs.
[8:06:00]So thank you very much for your time. And we'll open it up to questions.



