Readit News logoReadit News
awused commented on Was Rust Worth It?   jsoverson.medium.com/was-... · Posted by u/todsacerdoti
estebank · 2 years ago
Whenever you see weasel words like that, it means the compiler knows that the issue you encountered could be what it is saying but the necessary metadata to figure out for sure is inaccessible to it. It's the problem with classical stage-oriented compilers. A compiler designed for diagnostics from the beginning would end up looking like a plate of spaghetti where you can call type checking from the parser, to give an example.
awused · 2 years ago
>It's the problem with classical stage-oriented compilers. A compiler designed for diagnostics from the beginning

I'm not sure any compiler, even one "designed for diagnostics," can gather the "necessary metadata" from inside my brain based on my original intentions. If the compiler could unambiguously interpret what I meant to write, it wouldn't have needed to fail with a compilation error in the first place. Whenever I see something like "perhaps" or "maybe" in a compilation error, the information just didn't exist anywhere the compiler could possibly get to, and it's just a suggestion based on common mistakes.

awused commented on Was Rust Worth It?   jsoverson.medium.com/was-... · Posted by u/todsacerdoti
pizza234 · 2 years ago
> At this point, I'm about as fast in Rust as I am in Python.

This is factually impossible.

For anything larger than (very) small programs, Rust requires an upfront design stage, due to ownership, that it's not required when developing in GC'ed languages.

This is not even considering more local complexities, like data structures with cyclical references.

awused · 2 years ago
Honestly I found myself coding very much the same way in Rust as I did in Python and Go, which were my go-to hobby languages before. But instead of "this lock guards these fields" comments, the type system handles it. Ownership as a concept is something you need to follow in any language, otherwise you get problems like iterator invalidation, so it really shouldn't require an up-front architectural planning phase. Even for cyclic graphs, the biggest choice is whether you allow yourself to use a bit of unsafe for ergonomics or not.

Having a robust type system actually makes refactors a lot easier, so I have less up-front planning with Rust. My personal projects tend to creep up in scope over time, especially since I'm almost always doing something new in a domain I've not worked in. Whenever I've decided to change or redo a core design decision in Python or Go, it has always been a massive pain and usually "ends" with me finding edge cases in runtime crashes days or weeks later. When I've changed my mind in Rust it has, generally, ended once I get it building, and a few times I've had simple crashes from not dropping RefCell Refs.

awused commented on Was Rust Worth It?   jsoverson.medium.com/was-... · Posted by u/todsacerdoti
robertlagrant · 2 years ago
> Rust's type system forces you to deal with the problem early on and saves time towards the end. It's not like that's impossible with Python with addons like mypy.

Definitely not - mypy's pretty good these days, and lots of people use it.

> But Rust's type system goes beyond just data types - lifetimes are also a part of the type system. I don't know how you can tack that on to Python.

Well, Python's objects are generally garbage collected rather than explicitly destroyed, so I don't think it'd make sense to have lifetimes? They don't seem a correctness thing in the same way that types do.

awused · 2 years ago
Lifetimes and borrowing are very much a correctness thing and aren't just for tracking when memory is freed. While you won't have use-after-free issues in a GCed language, you will still have all the other problems of concurrent modification (data races) that they prevent. This is true even in single-threaded code, with problems like iterator invalidation.
awused commented on Maybe Rust isn’t a good tool for massively concurrent, userspace software   bitbashing.io/async-rust.... · Posted by u/mrkline
anonymoushn · 2 years ago
For io_uring all the reads go through io_uring and generally don't send back a result until some data is ready. So you'll receive a single stream of syscall results in which the results for all fds are interleaved, and you won't even be able to write code that has one task doing I/O starving other tasks. For epoll, polling the epoll instance is how you get notified of the readiness for all the other fds too. But the important thing isn't to poll the socket that you know is ready, it's to yield to runtime at all, so that other tasks can be resumed. Amusingly upon reading the rest of the blog post I discovered that this is exactly what tokio does. It just always yields after a certain number of operations that could yield. It doesn't implement preemption.
awused · 2 years ago
Honestly I assumed you had read the article and were just confused about how tokio was pretending to have preemption. Now you reveal you hadn't read the article so now I'm confused about you in general, it seems like a waste of time. But I'm glad you're at least on the same page now, about how checking if something is ready and yielding to the runtime are separate things.
awused commented on Maybe Rust isn’t a good tool for massively concurrent, userspace software   bitbashing.io/async-rust.... · Posted by u/mrkline
anonymoushn · 2 years ago
> You're confusing IO not happening because it's not needed with IO never happening. Just because a method can perform IO doesn't mean it actually does every time you call it. If I call async_read(N) for the next N bytes, that isn't necessarily going to touch the IO driver.

Maybe you can read the linked post again? The problem in the example in the post is that data keeps coming from the network. If you were to strace the program, you would see it calling read(2) repeatedly. The runtime chooses to starve all other tasks as long as these reads return more than 0 bytes. This is obviously not the only option available.

I apologize for charitably assuming that you were correct in the rest of my reply and attempting to fill in the necessary circumstances which would have made you correct

awused · 2 years ago
Actually, no, I misread it trying to make sense of what you were posting so this post is edited.

This is just mundane non-blocking sockets. If the socket never needs to block, it won't yield. Why go through epoll/uring unless it returns EWOULDBLOCK?

awused commented on Maybe Rust isn’t a good tool for massively concurrent, userspace software   bitbashing.io/async-rust.... · Posted by u/mrkline
Pauan · 2 years ago
I never said that ALL abstractions in Rust are zero-cost, though the vast majority of them are, and you actually have to explicitly go out of your way to use non-zero-cost abstractions.
awused · 2 years ago
Are you sure about that?

>Rust embraces abstractions because Rust abstractions are zero-cost. So you can liberally create them and use them without paying a runtime cost.

>you never need to do a cost-benefit analysis in your head, abstractions are just always a good idea in Rust

Again though, and ignoring that, "zero-cost abstraction" can be very narrow and context specific, so you really don't need to go out of your way to find "costly" abstractions in Rust. As an example, if you have any uses of Rc that don't use weak references, then Rc is not zero-cost for those uses. This is rarely something to bother about, but rarely is not never, and it's going to be more common the more abstractions you roll yourself.

awused commented on Maybe Rust isn’t a good tool for massively concurrent, userspace software   bitbashing.io/async-rust.... · Posted by u/mrkline
anonymoushn · 2 years ago
Reading from a socket, as in the linked post, is an example of not performing I/O? I'm not familiar with tokio so I did not know that it maintained buffers in userspace and filled them before the user called read(), but this is unimportant, it could still have read() yield and return the contents of the buffer.

I assumed that users would issue reads of like megabytes at a time and usually receive less. Does the example of reading from a socket in the blog post presuppose a gigabyte-sized buffer? It sounds like a bigger problem with the program is the per-connection memory overhead in that case.

The proposal is obviously not to yield 1 million times before returning a 1 meg buffer or to call read(2) passing a buffer length of 1, is this trolling? The proposal is also not some imaginary pie-in-the-sky idea; it's currently trading millions of dollars of derivatives daily on a single thread.

awused · 2 years ago
You're confusing IO not happening because it's not needed with IO never happening. Just because a method can perform IO doesn't mean it actually does every time you call it. If I call async_read(N) for the next N bytes, that isn't necessarily going to touch the IO driver. If your task can make progress without polling, it doesn't need to poll.

>I'm not familiar with tokio so I did not know that it maintained buffers in userspace

Most async runtimes are going to do buffering on some level, for efficiency if nothing else. It's not strictly required but you've had an unusual experience if you've never seen buffering.

>filled them before the user called read()

Where did you get this idea? Since you seem to be quick to accuse others of it, this does seem like trolling. At the very least it's completely out of nowhere.

>it could still have read() yield and return the contents of the buffer.

If I call a read_one_byte, read_line, or read(N) method and it returns past the end of the requested content that would be a problem.

>I assumed that users would issue reads of like megabytes at a time and usually receive less.

Reading from a channel is the other easy example, if files were hard to follow. The channel read might implemented as a quick atomic check to see if something is available and consume it, only yielding to the runtime if it needs to wait. If a producer on the other end is producing things faster than the consumer can consume them, the consuming task will never yield. You can implement a channel read method that always yields, but again, that'd be slow.

>The proposal is obviously not to yield 1 million times before returning a 1 meg buffer, is this trolling

No, giving a illustrative example is not trolling, even if I kept the numbers simple to make it easy to follow. But your flailing about with the idea of requiring gigabyte sized buffers probably is.

awused commented on Maybe Rust isn’t a good tool for massively concurrent, userspace software   bitbashing.io/async-rust.... · Posted by u/mrkline
anonymoushn · 2 years ago
> So just like any other kind of scheduling?

Yes. Industries that care about latency take some pains to avoid this as well, of course.

> io_uring and epoll have nothing to do with it and can't avoid the problem: the problem is with code that can make progress and doesn't need to poll for anything.

They totally can though? If I write the exact same code that is called out as problematic in the post, my non-preemptive runtime will run a variety of tasks while non-preemptive tokio is claimed to run only one. This is because my `accept` method would either submit an "accept sqe" to io_uring and swap to the runtime or do nothing and swap to the runtime (in the case of a multishot accept). Then the runtime would continue processing all cqes in order received, not *only* the `accept` cqes. The tokio `accept` method and event loop could also avoid starving other tasks if the `accept` method was guaranteed to poll at least some portion of the time and all ready handlers from one poll were guaranteed to be called before polling again.

This sort of design solves the problem for any case of "My task that is performing I/O through my runtime is starving my other tasks." The remaining tasks that can starve other tasks are those that perform I/O by bypassing the runtime and those that spend a long time performing computations with no I/O. The former thing sounds like self-sabotage by the user, but unfortunately the latter thing probably requires the user to spend some effort on designing their program.

> The problem isn't unique to Rust either, and it's going to exist in any cooperative multitasking system: if you rely on tasks to yield by themselves, some won't.

If we leave the obvious defects in our software, we will continue running software with obvious defects in it, yes.

awused · 2 years ago
>This sort of design solves the problem for any case of "My task that is performing I/O through my runtime is starving my other tasks."

Yeah, there's your misunderstanding, you've got it backwards. The problem being described occurs when I/O isn't happening because it isn't needed, there isn't a problem when I/O does need to happen.

Think of buffered reading of a file, maybe a small one that fully fits into the buffer, and reading it one byte at a time. Reading the first byte will block and go through epoll/io_uring/kqueue to fill the buffer and other tasks can run, but subsequent calls won't and they can return immediately without ever needing to touch the poller. Or maybe it's waiting on a channel in a loop, but the producer of that channel pushed more content onto it before the consumer was done so no blocking is needed.

You can solve this by never writing tasks that can take "a lot" of time, or "continue", whatever that means, but that's pretty inefficient in its own right. If my theoretical file reading task is explicitly yielding to the runtime on every byte by calling yield(), it is going to be very slow. You're not going to go through io_uring for every single byte of a file individually when running "while next_byte = async_read_next_byte(file) {}" code in any language if you have heap memory available to buffer it.

awused commented on Maybe Rust isn’t a good tool for massively concurrent, userspace software   bitbashing.io/async-rust.... · Posted by u/mrkline
anonymoushn · 2 years ago
Trying to solve the problem by frequently invoking signal handlers will also show in your latency distribution!

I guess if someone wants to use futures as if they were goroutines then it's not a bug, but this sort of presupposes that an opinionated runtime is already shooting signals at itself. Fundamentally the language gives you a primitive for switching execution between one context and another, and the premise of the program is probably that execution will switch back pretty quickly from work related to any single task.

I read the blog about this situation at https://tokio.rs/blog/2020-04-preemption which is equally baffling. The described problem cannot even happen in the "runtime" I'm currently using because io_uring won't just completely stop responding to other kinds of sqe's and only give you responses to a multishot accept when a lot of connections are coming in. I strongly suspect equivalent results are achievable with epoll.

awused · 2 years ago
>Trying to solve the problem by frequently invoking signal handlers will also show in your latency distribution!

So just like any other kind of scheduling? "Frequently" is also very subjective, and there are tradeoffs between throughput, latency, and especially tail latency. You can improve throughput and minimum latency by never preempting tasks, but it's bad for average, median, and tail latency when longer tasks starve others, otherwise SCHED_FIFO would be the default for Linux.

>I read the blog about this situation at https://tokio.rs/blog/2020-04-preemption which is equally baffling

You've misunderstood the problem somehow. There is definitely nothing about tokio (which uses epoll on Linux and can use io_uring) not responding in there. io_uring and epoll have nothing to do with it and can't avoid the problem: the problem is with code that can make progress and doesn't need to poll for anything. The problem isn't unique to Rust either, and it's going to exist in any cooperative multitasking system: if you rely on tasks to yield by themselves, some won't.

awused commented on Maybe Rust isn’t a good tool for massively concurrent, userspace software   bitbashing.io/async-rust.... · Posted by u/mrkline
Pauan · 2 years ago
I am very aware of the definition of zero-cost.

We're talking about the comparison between using an abstraction vs not using an abstraction.

When I said "doesn't have a runtime cost", I meant "the abstraction doesn't have a runtime cost compared to not using the abstraction".

If you want your computer to do anything useful, then you have to write code, and that code has a runtime cost.

That runtime cost is unavoidable, it is a simple necessity of the computer doing useful work, regardless of whether you use an abstraction or not.

Whenever you create or use an abstraction, you do a cost-benefit analysis in your head: "does this abstraction provide enough value to justify the EXTRA cost of the abstraction?"

But if there is no extra cost, then the abstraction is free, it is truly zero cost, because the code needed to be written no matter what, and the abstraction is the same speed as not using the abstraction. So there is no cost-benefit analysis, because the abstraction is always worth it.

awused · 2 years ago
The way you used it in your parent comment didn't make it clear that you were using it properly, hence my clarification. I'm honestly still not sure you've got it right, because Rust abstractions, in general, are not zero-cost. Rust has some zero-cost abstractions in the standard library and Rust has made choices, like monomorphization for generics, that make writing zero-cost abstractions easier and more common in the ecosystem. But there's nothing in the language or compiler that forces all abstractions written in Rust to be free of extra runtime costs.

u/awused

KarmaCake day59January 9, 2020View Original