Readit News logoReadit News
pizlonator · 2 years ago
It’s possible to write many other bump algorithms, including ones that bump upwards but generate much much better code than OP’s forward upward bump. In particular, no overflow checks are needed if you write it carefully enough.

My favorite is one where I only use subtraction but the direction of allocation is in the positive direction (I subtract a `remaining` counter, and the returned address is `end - remaining`).

But I have seen many others. A few of my colleagues have similarly geeked out on this and come up with splendid bumpers (ggaren wrote a great one in bmalloc, and I remember the MMTk folks put hella thought into theirs).

titzer · 2 years ago
In most situations, the allocation size is a constant and bumping upwards can be done without an overflow check because the region cannot be close enough to the upper part of memory to wrap around.

I'm surprised that there was no discussion of memory system performance. There are tons of OS-level and hardware prefetcher optimizations for forward-marching pointer references and zero-page allocation.

pizlonator · 2 years ago
You can bump upwards without an overflow check even if the size is variable.

I’ve seen multiple ways to do it. And I’ve lost days of my life to coming up with pointlessly amusing variants that get there in weird ways.

Yeah I’m also amused that they didn’t get into the fact that downward bump is just not what the HW expects you to do. You’d need a real benchmark to see that effect. And, it’s an open question whether this would matter unless your allocation rate was obscene (if it isn’t then most scans of memory will be normal loops and not the allocator).

ridiculous_fish · 2 years ago
The insight here is that, for unsigned values, rounding down to a multiple of N is cheaper than rounding up to a multiple of N.

In addition to simpler arithmetic, rounding down always succeeds - 0 is a multiple of all N - while rounding up may fail if a larger multiple does not exist. So you need to check in the round-up case.

benmanns · 2 years ago
Somewhat aside, I love the graphical display of the benchmark results: https://fitzgeraldnick.com/media/bumpalo-switch-to-downwards... --I think I'll start doing that when I run benchmarks in the future.
thethirdone · 2 years ago
Notably, that graph starts at 25 which seems misleading.
wrsh07 · 2 years ago
Eh it was clearly labeled, and I pretty quickly gleaned that it was like 5 microseconds faster
dev_dwarf · 2 years ago
In the "bump up" version you could remove both the checked_add branches and replace them with a single check at the end, making the amount of branches the same.

Quick example: https://godbolt.org/z/rdv4qnrs8.

*edited to update the example, realized I messed up the comparison logic.

ridiculous_fish · 2 years ago
I think this doesn't work because `aligned + size` may wrap all the way around into the valid region again. For example if aligned == ptr + 1, and size is usize::MAX, we will end up with new_ptr == ptr and the allocation will wrongly succeed.
dev_dwarf · 2 years ago
Interesting point. I modified my example to test what you described. I had to play with the compilation flags to get the allocs to not be optimized out and to not panic when the integer overflow happens, but otherwise I didn't change the logic. I'm pretty sure my implementation is correctly handling the case you mention, evidenced by it returning a null pointer.

Link: https://godbolt.org/z/f1jGW6Pa3

Update: NVM, definitely not being handled correctly. https://godbolt.org/z/cMTe1o979

loeg · 2 years ago
The straightforward answer is: don't tolerate ridiculous alignments.
sltkr · 2 years ago
That version is unsafe: what if size == 0xfff..fff and alignment is needed? You will end up with ptr <= new_ptr < end, seemingly a valid result, but actually with not enough space.

edit: code moved to a toplevel comment

loeg · 2 years ago
No allocator can be expected to allocate usize::MAX, so it doesn't really matter.
sltkr · 2 years ago
You can also implement BumpUp() with only two conditionals and no overflow like this (in C, because I don't know Rust, but the logic is identical):

    size_t alignment_needed = align - (ptr & (align - 1));
    if (alignment_needed > SIZE_MAX - size) return nullptr;
    size_t space_needed = size + alignment_needed;
    size_t space_remaining = end - ptr;
    if (space_needed > space_remaining) return nullptr;
    char *result = ptr + alignment_needed;
    ptr += space_needed;
    return result;

loeg · 2 years ago
Yeah, and you only need one check if you can restrict the allocation pointer to be at least max alignment away from SIZE_MAX, which is very easy to do in practice. Max alignment needed is usually no more than 64 bytes, but even 4kB isn't especially burdensome. On Linux/amd64, you don't even need to do anything special -- the high virtual address space is likely reserved for the kernel anyway.
dev_dwarf · 2 years ago
Nice. Seems to work for case you mentioned under my comment: https://godbolt.org/z/TYorcd8b6
gjm11 · 2 years ago
Since the title is less than perfectly perspicuous:

It's about "bump allocators", where you allocate memory just by incrementing/decrementing a pointer (and don't free it until you're ready to free up all the memory all at once).

These can either allocate "upwards", starting at low addresses and giving you memory at higher addresses for successive allocations, or "downwards", going the other way.

The claim being made is that "downwards" is better, because the bounds checks you need to do turn out to be more efficient that way; allocation ends up using fewer instructions, fewer registers, and fewer conditional jumps.

chrchang523 · 2 years ago
Incidentally, you can choose to bump in both directions. It's more complicated (you need to keep track of which end you allocated each data structure on), but in exchange, the allocator becomes sufficient for many more use cases.

Given a choice, the OP implies that you should position small-but-numerous allocations next to the top, and larger infrequent allocations next to the bottom.

dev_dwarf · 2 years ago
This sounds like it would make the alloc logic much more complicated and branch-y, defeating the purpose of bumping down anyway, unless your implying some compile-time way to do this.
justin_oaks · 2 years ago
I only recently learned about bump allocators. I was very confused as to how such an allocator could be useful. In reading the author's bumpalo crate [1] documentation, he cleared up my confusion.

For one thing, this particular allocator is separate from the general allocator so the calling code can choose when to use the bump allocator.

Quoting the crate doc:

> [B]ump allocation [is] well-suited for phase-oriented allocations. That is, a group of objects that will all be allocated during the same program phase, used, and then can all be deallocated together as a group.

I can see this being useful as a way to optimize allocation/deallocation speed for specific use-cases.

[1] https://docs.rs/bumpalo/latest/bumpalo/

loeg · 2 years ago
It's great when you don't need destructors and have a bunch of small objects to allocate with similar lifetimes.
ph4evers · 2 years ago
It is also used in WASM. I guess to increase memory safety
saagarjha · 2 years ago
Bump allocators are very simple in an environment where everything you want must be brought with you.
asalahli · 2 years ago
The rust code for rounding down is given as

    let new_ptr = new_ptr & !(align - 1);
But I don't see rdx decremented before being negated and AND'ed with rax, in the assembly. What am I missing?

conradludgate · 2 years ago
It's being converted into `-align` which is equivalent under twos compliment
asalahli · 2 years ago
I don't follow. Are you saying `new_ptr & -align` is equivalent to `new_ptr & !(align - 1)`?

Edit: I get it now. My brain was interpreting NEG as NOT for some reason.