willothy (u/willothy)

willothy commented on Show HN: Whirlwind – Async concurrent hashmap for Rust github.com/fortress-build... · Posted by u/willothy

jkelleyrtp · 10 months ago

Tokio has a task budget that will cap at 32 or 256 polls of the same task before switching to another task. So, yes a spinlock, but not likely to deadlock.

willothy · 10 months ago

Yeah, after looking into this more I think this was a big oversight on my part. Working on a (hopefully) better way of doing this right now - I'm thinking per-shard waker queues and only falling back to spinlocking like this if the queues are full.

willothy commented on Show HN: Whirlwind – Async concurrent hashmap for Rust github.com/fortress-build... · Posted by u/willothy

sujayakar · 10 months ago

+1. I'd be curious how much of a pessimization to uncontended workloads it'd be to just use `tokio::sync::RwLock`.

and, if we want to keep it as a spinlock, I'm curious how much the immediate wakeup compares to using `tokio::task::yield_now`: https://docs.rs/tokio/latest/tokio/task/fn.yield_now.html

willothy · 10 months ago

This is an interesting idea. I am gonna try this out - especially with dashmap, I think that could perform very well.

willothy commented on Show HN: Whirlwind – Async concurrent hashmap for Rust github.com/fortress-build... · Posted by u/willothy

conradludgate · 10 months ago

Expanding on this. If you have a lot of concurrent tasks, you will overflow[0] the task local queue and be bottlenecked by the global queue mutex[1]

[0]: https://github.com/tokio-rs/tokio/blob/8897885425bf3d89053f8... [1]: https://github.com/tokio-rs/tokio/blob/8897885425bf3d89053f8...

willothy · 10 months ago

Oh this is really good to know, thank you!

willothy commented on Show HN: Whirlwind – Async concurrent hashmap for Rust github.com/fortress-build... · Posted by u/willothy

conradludgate · 10 months ago

It does immediately put itself into the queue to be polled again. But that's no different in effect to a spin-lock. If you have other tasks in your runtime, this will be putting excess pressure on your scheduler

willothy · 10 months ago

Hmm, imo it's definitely better than directly spinlocking to have many spinlocks running cooperatively, but you're right that it may not be ideal. Thanks for pointing this out. I'll see if I can find a better way to coordinate the polling/waking of lock acquisition futures.

willothy commented on Show HN: Whirlwind – Async concurrent hashmap for Rust github.com/fortress-build... · Posted by u/willothy

conradludgate · 10 months ago

I don't think I'd recommend using this in production. The benchmarks look good, but by immediately waking the waker[0], you've effectively created a spin-lock. They may work in some very specific circumstances, but they will most likely in practice be more costly to your scheduler (which likely uses locks btw) than just using locks

[0]: https://github.com/fortress-build/whirlwind/blob/0e4ae5a2aba...

willothy · 10 months ago

I don't believe that waking the waker in `poll` synchronously waits / runs poll again immediately. I think it is more likely just adding the future back to the global queue to be polled. I could be wrong though, I'll look into this more. Thanks for the info!

willothy commented on Show HN: Whirlwind – Async concurrent hashmap for Rust github.com/fortress-build... · Posted by u/willothy

lesuorac · 10 months ago

I kind of like the "thin" stdlib approach tbh.

Rather than having 3-5 ways to do a say a GUI from the stdlib you have 0 and instead you can pick a dependency that suits your way.

That then boils down to even data structures, there's a bunch of trade-offs to be made where one is better than the other and if Rust had to maintain 30 different types of maps for each use case that seems like a waste of their time. Push that work off to the people that need those maps.

willothy · 10 months ago

I agree with this take a lot. I think having lots of custom implementations is inevitable for systems languages - the only reason why Rust is different is because Cargo/crates.io makes things available for everyone, where in C/C++ land you will often just DIY everything to avoid managing a bunch of dependencies by hand or with cmake/similar.

willothy commented on Show HN: Whirlwind – Async concurrent hashmap for Rust github.com/fortress-build... · Posted by u/willothy

KolmogorovComp · 10 months ago

Great crate! Why use a hashmap instead of a Btreemap which is usually advised in rust?

willothy · 10 months ago

There are a few reasons - For one, I'm not sure BTreeMap is always faster in Rust... it may be sometimes but lookups are still O(log(n)) due to the searching where with a HashMap it's (mostly) O(1). They both have their uses - I usually go for BTreeMap when I explicitly need the collection to be ordered.

A second reason is sharding - sharding based on a hash is quite simple to do, but sharding an ordered collection would be quite difficult since some reads would need to search across multiple shards and thus take multiple locks.

If you mean internally (like for each shard), we're using hashbrown's raw HashTable API because it allows us to manage hashing entirely ourselves, and avoid recomputing the hash when determining the shard and looking up a key within a shard.

willothy commented on Show HN: Whirlwind – Async concurrent hashmap for Rust github.com/fortress-build... · Posted by u/willothy

judofyr · 10 months ago

Very fascinating results though! The sharding approach is quite a simple way of doing more fine-grained locking, making sure that writes don't block all the reads. Pretty cool to see that it actually pays off even with the overhead of Tokie scheduling everything!

There might be some fairness concerns here? Since we're polling the lock (instead of adding ourselves to a queue) it could be the case that some requests are constantly "too late" to acquire it? Could be interesting to see the min/max/median/P99 of the requests themselves. It seems that Bustle only reports the average latency[1] which honestly doesn't tell us much more than the throughputs.

[1]: https://docs.rs/bustle/latest/bustle/struct.Measurement.html

willothy · 10 months ago

This is a great point too - I definitely want to run some more varied benchmarks to get a better idea of how this performs in different settings. We'll also be using it in prod soon, so we'll see how it does in a real use setting too :)

willothy commented on Show HN: Whirlwind – Async concurrent hashmap for Rust github.com/fortress-build... · Posted by u/willothy

James_K · 10 months ago

> This crate is in development, and breaking changes may be made up until a 1.0 release.

When will this happen? I imagine a lot of people who want use it might just forget about it if you say "do not use this until some unspecified point in the future".

willothy · 10 months ago

I think there's a pretty big difference between committing to semantic versioning and saying "do not use this until some unspecified point in the future." Maybe I'm just not clear enough in the note - I just mean that the API could change. But as long as a consumer doesn't use `version = "*"` in their Cargo.toml, breaking changes will always be opt-in and builds won't start failing if I release something with a big API change.

willothy commented on Show HN: Whirlwind – Async concurrent hashmap for Rust github.com/fortress-build... · Posted by u/willothy

lesuorac · 10 months ago

When all the examples are marked with `#[tokio::main]`, it probably requires tokio.

Although I guess they do implement Future on their own so it shouldn't need a specific runtime then.

https://github.com/fortress-build/whirlwind/blob/main/src/sh...

willothy · 10 months ago

Tokio is just used for async tests and the examples, the crate doesn’t depend on any specific async runtime :)