📌 Hello World
Who will enjoy this blog the most?
Serving LLMs at Scale: What Nobody Tells You About GPU Infrastructure
Running a large language model locally on your laptop is a weekend project. Running it reliably in production for hundreds of concurrent users is an infrastructure problem that will test every assumption you have about Kubernetes, GPU scheduling, and cost...
Why Multi-Cluster Kubernetes is Hard — and How We Solved It
For a long time, Kubernetes was sold as the solution to distributed systems complexity. And to be fair — for a single cluster, it largely delivers. But the moment you need to run workloads across multiple clusters, across cloud providers,...
Max out my code - Part 2
Welcome back! In Part 1 we looked at the tools cpp gives us to spawn threads — std::thread, std::future/promise, and std::async. We ran tasks concurrently but never actually made the tasks themselves faster.
Max out my code - Part 1
This article is an attempt at using cpp multithreading for faster programs.
To Sync Or Not To Sync
Lately I have been working on code that deals with plenty of multithreaded solutions.
Timing
We developers have to deal with perfomance of our code on a daily basis. Most of the distributed networks have to worry about the added burden on every new line of code or new API used in their application.