The significant decrease they talk about is a side effect of their chosen language having a GC. This means the strings take more work to deal with than expected.
This feels more like this speaks to the fact that the often small costs associated with certain operations do eventually add up. it's not entirely clear in the post where and when the cost from the GC is incurred, though; I'd presume on creation and destruction?
edit: There are tricks to not traverse a compound object every time, but assume that at least one of the 80M objects in that giant array gets modified in between GC activations.
How much a GC is of total cpu cost totally depends on the application, the GC implementation and the language. It's famously hard to measure what the memory management overhead is, GC in production is anywhere between 7-82% (Cai ISPASS2022). I measured about 19% geomean overhead in accurate simulation by ignoring instructions involved in GC/MM in python's pyperf benchmarks.