> Since the state is stored in a multi-level Bloom Filter style data structure, the memory needed is constant and does not scale with the number of clients.
Constant memory, but those hashes will take up CPU cycles. If you're running a workload that completes sub 20 milliseconds, these cycles spent hashing may not be worth it over, say, a constant-time admission control like token bucket.
Absolutely my thought as well. During my thesis i found token buckets to be better and more fair compared to other algos[1]. This was in libvirt. my finding was that it scaled well up to around 1k buckets if memory serves right. after that the results were weird to say the least. (of course the servers i tested on were quite old with not too much memory). Would be nice to run my thesis tests again to find what happened in the last decade.
Not just CPU-bound, but IO-bound workloads can also complete within milliseconds.
> have they claimed that?
The mention of "token bucket" in the project readme is why I wrote the comment I did.
... FAIR [only throttles] when there's a genuine shortage of resources as opposed to the approaches like token bucket or leaky bucket which may reject requests ...
That's the typical design for a token-bucket rate limiter. A token-bucket limiter uses more memory but is a simpler design. I believe this bloom-filter based implementation is designed to be more memory efficient at the cost being less CPU efficient.
Thinking this could be useful in a multi tenants service where you need to fairly allocate job processing capacity across tenants to a number of background workers (like data export api requests, encoding requests etc.)
That was my first thought as well. However, in a lot of real world cases, what matters is not the frequency of requests, but the duration of the jobs. For instance, one client might request a job that takes minutes or hours to complete, while another may only have requests that take a couple of seconds to complete. I don't think this library handles such cases.
The text suggests a method for managing GPU or rate-limited resources across multiple clients. It highlights the problem of spikey workloads, where a client might generate a large number of events (e.g., from a CSV upload) causing resource starvation. The text advises against using naive solutions like FIFO, which could disadvantage clients with steady live traffic.
Rate limiters are used to protect servers from overload and to prevent attackers--or even legitimate but unintentionally greedy tenants--from starving other tenants of resources. They are a key component of a resilient distributed system.
Very interesting and looking forward to trying this. I am a big fan of SFB for this type of stuff but haven't seen anything in distributed space that's beyond a science project. Would be great if I can use it on my 10k+ user web site.
I coded up something like Fair a couple 1-2 years ago: https://github.com/JensRantil/conc It's alpha software, but maybe interesting to someone don't know.
Shouldn’t this be built into a queue somehow? I’d love to see a queuing solution like SQS but has a built in fairness, where you can fully utilize a capacity but as soon as, let’s say customers compete on resources, some fairness kicks in.
Is there anything like that?
They are complementary solutions, not substitutes. Load balancers distribute traffic among servers and are a key component to enable horizontal scalability. Throttling is a prophylactic feature that prevents attackers or greedy consumers from overutilizing capacity and depriving it from other legitimate users. This library relates to the latter.
Unfortunately the title of the GitHub repo ("A Go library for serving resources fairly") is misleading. This is not a server; it's a library that a server can utilize to determine whether a request has exceeded fairness bounds and should be rejected with an HTTP 429 (too many requests) response.
In a multi-tenant system, you might have one customer who drops a big job that creates a huge number of tasks. We'd like to process this as fast as possible, but not block the tasks of small jobs from other customers, which should normally be completed very quickly.
We had to make something similar. We have both huge tenants (200k users in one tenant) and small tenants with 10 users. Sometimes there are spikes when a large tenant generates thousands of jobs. In a naive implementation, 1 tenant would be able to completely block job processing for all other tenants. We have to make sure the next job is picked from a different tenant each time, so that all tenants were served fairly. However, a large tenant may end up waiting for its jobs to complete for too long. In that case, we move such a tenant to a different infrastructure (sometimes, fully dedicated).
Isn't the exactly the problem that preemptive multitasking is built for? For example any program built on the BEAM[1] wouldn't have this problem presumably. Do most languages not have a standard solution for this?
Those are completely unrelated. K8s does not provide client rate-control and FAIR does not do load-balancing. It appears you misunderstood what this is.
Constant memory, but those hashes will take up CPU cycles. If you're running a workload that completes sub 20 milliseconds, these cycles spent hashing may not be worth it over, say, a constant-time admission control like token bucket.
I tried writing another algorithm for network splitting but didn't get any better results. [1]: https://www.csg.uzh.ch/publications/details.php?id=1007
> have they claimed that?
The mention of "token bucket" in the project readme is why I wrote the comment I did.
As usual, tradeoffs are everywhere.
I am very intrigued to find out how this would fit in, if at all.
See, e.g., https://docs.aws.amazon.com/wellarchitected/latest/framework...
This project, however, looks like a concurrency limiter, not a rate limiter. I'm also not sure how it works across a load-balanced cluster.
Unfortunately the title of the GitHub repo ("A Go library for serving resources fairly") is misleading. This is not a server; it's a library that a server can utilize to determine whether a request has exceeded fairness bounds and should be rejected with an HTTP 429 (too many requests) response.
[1]: https://en.m.wikipedia.org/wiki/BEAM_(Erlang_virtual_machine...
Dead Comment
Dead Comment