Last week I attended the AWS Summit in London, at the Excel centre in Docklands. It was amazing to see so many people attending in person.
Myself and Michael (our CEO) presented a Partner Spotlight talk, “FinOps: How to empower software engineers to take action”. We picked this topic because getting engineers to take action was the #1 issue identified in the FinOps Foundation survey for the past 2 years, and it’s something we have helped many clients with. We spoke about 3 main themes:
1 - Separate signal from noise
Engineers are busy. Their main focus is delivering features to drive the business forward. In some cases they may have a few hours allocated to optimizing cost, but in most any work will be in their “spare time”
Don’t spam them. Do your homework. Provide them with everything they need to make a decision, don’t ask them to do all the work. In fact try to minimise the amount of work you ask them to do.
An analogy is with recruitment consultants. Sometimes they spam you with loads of CVs that match a few key words. You have to do a load of work to filter them down to 1 or 2 that are a reasonable fit. You don’t tend to use that consultant for long. Whereas if they understand what you’re looking for, ask sensible questions, and provide you with a smaller number of CVs with a high hit rate then you’ll keep on using them.
The equivalent of the first type is to send engineers a list of 1000 EC2 instances that your tool says can be rightsized. That very rarely achieves anything, and the amount of “are they stupid, this obviously won’t work” items in there mean that everything else you say in the future will be pretty much ignored.
Instead do some homework: We found this large prod ASG, all the instances are r5.24xlarge but CPU & memory haven’t gone above the level that a 4xlarge could handle on any of them. Resizing to that would save 40k PA per instance and on average there are 100 instances - that’s 4m. I also see a similar ASG in stage, so maybe we could try the resize there first?
2 - Make it as easy as possible to take action
The first part of this builds on the previous point. Make it as easy as possible for the engineer to make a decision on a specific optimization that makes sense in the context of their application. Do as much analysis as you can up-front and present the engineer with all the information they need.
The next step is to understand what execution of the optimization would involve. This will impact the amount of work needed, when it can be done, and who needs to be involved. All of that may even affect whether the optimization is worth doing, it might cost more to do that it will save in a year. It will also help you understand the most likely blockers to progress. In our experience even after an optimization is agreed as valid in theory, many are never implemented. If you can anticipate the possible causes for this you can look for ways to mitigate them.
The kind of questions to be asking yourself are: How exactly is this type of change made? Is the resource deployed using IaC? Are there a lot of manual steps, or is it/can it be automated? Will there need to be system downtime? Are there change windows that need to be considered? Does the engineer make these changes, or some other team? Whose approval/buy-in will be needed? Are there dev/test environments that should be changed first? Does it need to be (or could it be) coordinated with some other changes (optimizations or other)?
3 - Provide feedback loops + Show clear responsiveness and improvement
You won’t always get it spot on. There will be some context you don’t have. That’s OK provided you did what homework you can, and that you learn from the feedback.
Back to our recruitment consultant analogy: You get some CVs through but several are missing the mark because there’s some specific experience that’s a must-have for the role. You explain this to the consultant. If next time there are CVs with the same problem you’ll be frustrated. Did the consultant not listen? Why are they wasting your time like this?
It’s basic customer service - listen to feedback and act on it.
In Summary - Context & Empathy
The common themes across all those areas are context and empathy. Understand the engineer’s world. Put yourself in their shoes. View them as a customer. Your job is to make it as easy as possible for them to optimize their Cloud resource usage.