One of the many things I learned in 2022 is that I have a particular knack for understanding, analyzing, and optimizing the costs of data platform infrastructure. These skills were born out of both curiosity and necessity in the current economic climate, and have led me to start a small consuhltancy on the side: Buoyant Data. Big data infrastructure can be hugely valuable to lots of businesses, but unfortunately it’s also an area of the cloud bills that is frequently misunderstood, that’s something that I can help with!

Mike Julian from The Duckbill Group once made the proclamation that the way to actually save money in AWS is to design your infrastructure to be cost-effective. “Optimization” techniques can only take you so far, and once you’ve burned through all the optimizations, you may find yourself needing to further reduce the cost of your infrastructure and have no more “fat” to trim! In the first blog post I outline a “reference architecture” for a data platform which I know is cost-effective, easy to manage, and lends itself well to growth.

Planning for sensible, cost-concious growth is very important. With most data platforms as they start to prove their value, the organization will bring even more workloads to them. If you give a data scientist a good platform, they will find themselves wanting ever more from that data platform, and Buoyant Data can help make sure that growth is sustainable and the value to the business is easy to identify as well.

Please add the Buoyant Data RSS feed to your reader, as I have a number of blog posts queued up already with some gratis tips and tricks for understanding the cost of your data platform! 😄

The technology stack for Buoyant Data is something I cannot wait to write more about. After funding the creation of delta-rs as part of my day job, I am utilizing the library in a big way to build extremely lightweight and cost-efficient data ingestion pipelines with Rust and AWS Lambda. There’s still plenty of space for Apache Spark on the querying and processing side, but as DataFusion matures, I’m looking forward to exploring where that can fit into the picture.

There’s a lot of evolution happening right now in the data and ML platform space, I’m really looking forward to growing Buoyant Data in my spare time!