How to scale up

November 29, 2024

As any software service grows it often needs to scale up and scale out.

Scaling up is running on larger server with more cores.
Scaling out is running a program across more servers.

Some software makes the conscious decision to not scale up and only scale out. A great example of this is Redis which made the conscious decision to be single threaded so it does not need to worry about locking or race conditions. Other programs are built with scale up in mind. A common architecture for this is the Share Nothing Architecture. Scylladb is a good example of this because it shards data across each physical core so you never have two different threads accessing the same data.

Most software lands somewhere in the middle and struggles scaling up. To evaluate why your program’s performance is not scaling linearly as you add more cores you should look at the following:

perf c2c finds shared memory which can become contested if multiple cores and threads are trying to read it simultaneously. This requires metal vm’s in AWS. For x86 it uses offcore PMU registers. For ARM it needs a newer linux perf binary (which can be built from source) so it can use SPE (Statistical Profiling Extension) which cannot give offcore information like HITM or remote DRAM % but can calculate latency of memory accesses which directly correlates to highly contended memory.
offcputime finds locks which can cause threads to idle as they wait for the lock to be released. It uses eBPF so it only works on kernels compiled with eBPF enabled.