Is parallel programming hard, and, if so, what can you do about it?

I see a lot of confusion between parallel programming [1] and concurrent programming [2] in the comments here.

The former and what this book is about deals with the problem of parallelizaing a single sequential program. There usually is strong interaction or dependencies between elements and progress needs synchronization. E.g. timestep iterations in real-time simulations that need synchronization with data communication after each timestep. These simulation also tend to get way to big to be run on a single machine, lest a single thread, and get scaled up to millions of cores/threads in supercomputers.

Concurrent programming is what most developers working with the internet are more familiar with. You have mostly independent tasks that you want to run concurrently. "A concurrent system is one where a computation can advance without waiting for all other computations to complete." [2] E.g. nginx serving thousands of user requests at the same time.

The problem domains have a lot of overlap on the basics (e.g. threading), however the focus is very different. Things like synchronization (mutex, barriers), cache locality and memory bandwith & latency play a central role in parallel programming, while concurrent programming focuses more on the engineering challenge of distributing independent tasks across multiple threads or machines.

[1] https://en.wikipedia.org/wiki/Parallel_computing

[2] https://en.wikipedia.org/wiki/Concurrent_computing

vivegi · 3 years ago

The book covers it in Appendix A.6 (p 424) in the v2023.06.11a PDF file.

> A.6 What is the Difference Between “Concurrent” and “Parallel”?

> From a classic computing perspective, “concurrent” and “parallel” are clearly synonyms. However, this has not stopped many people from drawing distinctions between the two, and it turns out that these distinctions can be understood from a couple of different perspectives.

> The first perspective treats “parallel” as an abbreviation for “data parallel”, and treats “concurrent” as pretty much everything else. From this perspective, in parallel computing, each partition of the overall problem can proceed completely independently, with no communication with other partitions. In this case, little or no coordination among partitions is required. In contrast, concurrent computing might well have tight interdependencies, in the form of contended locks, transactions, or other synchronization mechanisms.

> This of course begs the question of why such a distinction matters, which brings us to the second perspective, that of the underlying scheduler. Schedulers come in a wide range of complexities and capabilities, and as a rough rule of thumb, the more tightly and irregularly a set oparallel processes communicate, the higher the level of sophistication required from the scheduler. As such, parallel computing’s avoidance of interdependencies means that parallel-computing programs run well on the least-capable schedulers. In fact, a pure parallel-computing program can run successfully after being arbitrarily subdivided and interleaved onto a uniprocessor. In contrast, concurrent computing programs might well require extreme subtlety on the part of the scheduler.

ascar · 3 years ago

Well, funnily enough this does read in contrast to the definitions used in Wikipedia, which are the ones I am also familiar with (I also do teach a class called "Parallel Programming" to graduates).

I do think the differentation make sense from a perspective of problem classes, as also evident from the comments here. Running independent problems in parallel to better utilize hardware ressources is very different from running problems in parallel in timesteps that have strong dependencies in regards to progress of the overall computation. And that's not a problem of the scheduler, but a much more general concept.

It doesn't sound to me like the author has the whole web service parallelism/concurrency in mind that is very apparent in the comments here.

preseinger · 3 years ago

that description is... not accurate

concurrent is about logical independence, parallel is about physical independence

crabbone · 3 years ago

I don't know who invented this nonsense distinction. First time I was introduced to this idea of "concurrent programming" being a separate thing when Go was released. So, I associate this nonsense with Go, but it could have happened earlier, I simply never heard about it before then.

Anyways. The way I see it used today, it's applied to language runtimes incapable or severely crippled when it comes to parallel / concurrent execution. Eg. Python, JavaScript etc. In such environments programmers are offered a mechanism that has many downsides of parallel / concurrent programming (eg. unpredictable order of execution) without the benefits of parallel / concurrent programming (ie. nothing actually happens at the same time, or only sleep is possible at the same time etc.)

I feel like this distinction, while a nonsense idea at its core, became so popular due to the popularity of language runtimes with disabilities and their users needing to validate their worth by adding features to their languages their runtimes are inherently incapable of implementing.

Similar situation happened with ML-style types. Python, for example, works very poorly with this add-on, but the desire to match features of other languages led Python developers to add those types anyways. Similarly, TypeScript and a bunch of similar languages, especially in Web.

vore · 3 years ago

How is this nonsense or anything to do with "language runtimes with disabilities"? An OS running on a single core processor cannot be parallel but it may be concurrent: it can never physically do two things at the same time, but it might be able to logically interleave different tasks.

Parallelism is a physical thing, concurrency is a logical thing.

phkahler · 3 years ago

>> don't know who invented this nonsense distinction ...

It not nonsense. In C or C++ lots of code can be made parallel using OpenMP and inserting some #pragma statements above for loops. This does not work for things like running a UI in one thread and some other work in another thread, perhaps displaying results as they are found. These are quite different types of parallelism.

saltcured · 3 years ago

I'm not sure I can identify when it started, but these were already the concepts commonly in use when I did my CS undergraduate work in the early 90s. I.e. it was in textbooks and course titles as established jargon.

Concurrency was the kind of thing worried about in OS design or Unix programming styles whether on a time-sharing system or some small scale multi-processing system. Coordination of heterogeneous sequential programs on some shared resources.

Parallelism was the topic of high-performance computing with combined use of many hardware resources to accelerate a single algorithm.

Of course these are simplifying abstractions, and real systems can get into the murky gray area that is both concurrent and parallel.

AmpsterMan · 3 years ago

Rob Pike has a discussion on the distinctions between parallelism and concurrency. Concurrency is closely related to co-routines which are a distinct invention from threads/processes which are more related to parallelism.

mmcnl · 3 years ago

Just because you didn't know about the concept doesn't mean the distinction is nonsense. I think they are similar but not the same, exactly for the reasons laid out in the comment you are replying too. Just because you hate the languages that support concurrent programming doesn't mean concurrent programming is meaningless. Any language using async/await (basically all of them these days) support concurrent programming, including languages such as Swift and C# which are nothing like Python or JavaScript.

sublinear · 3 years ago

> In such environments programmers are offered a mechanism that has many downsides of parallel / concurrent programming (eg. unpredictable order of execution) without the benefits of parallel / concurrent programming (ie. nothing actually happens at the same time, or only sleep is possible at the same time etc.)

From the developer's perspective it's a massive upside to not have to manage low-level details and just define how the event loop will call their code.

Any modern web browser has plenty of parallel execution behind the scenes, but the developer (and user) will just see concurrency which is much simpler to reason about. The order of execution doesn't matter if the things being executed aren't dependent. What matters more is that there's only one thread to think about. If they are dependent they shouldn't have been parallelized in the first place, so they're not.

Dead Comment

aidenn0 · 3 years ago

Both I (and apparently the author of TFA) disagree with your definition of parallel programming. TFA gives an example of "embarrassingly parallel" programs as one way to make parallel programming simple.

The distinction I learned was: any time you have multiple logical threads of execution you have concurrency, any time you have multiple computations happening simultaneously, you have parallelism.

Multithreaded programming on a single core computer is concurrent, but not parallel. Vector processing is parallel, but not concurrent.

ascar · 3 years ago

> The distinction I learned was: any time you have multiple logical threads of execution you have concurrency, any time you have multiple computations happening simultaneously, you have parallelism.

I like this distinction as it also splits the different problem domains quite well. And I don't think it contradicts my definition as much as you might think.

When you have an embarrassingly parallel program you do not have to deal with the problems that come from data dependencies and synchronization of your simultaenously running compuations on different threads/machines. You do not really have to think about your computation running in parallel, but just about how to put them into different execution environments to run them concurrently. So you end up doing "concurrent programming".

When you do not have an embarrassingly parallel program, you still use the base concepts of running something concurrently (e.g. threads), but now your main focus shifts on how the multiple compuations can happen simultaneously. Now you end up doing "parallel programming" or parallel computation.

In the end, the terminolgy here is less than ideal. My main point was that some kind of distinction matters as TFA clearly discusses different topics from what many people think about from a web dev perspective (e.g. async, futures, etc.)

Salgat · 3 years ago

Vector processing is not parallelism, but rather non-scalar. Specifically, it's a single operation that is able to do work on multiple data items, rather than parallel processors doing work at the same time.

dahart · 3 years ago

I know we’re kind stuck with the term of art “concurrent”, and will forever have to explain the difference between the synonymous words “concurrent” and “parallel” — The Merriam Webster definition of “concurrent” uses the word “parallel”, for example.

Wouldn’t it be nice if we could come up with a better word for this that doesn’t literally overlap with ‘parallel’ and doesn’t need to deviate so far from it’s dictionary definition?

Personally I think of JavaScript as ‘asynchronous’, and I know this as a term of art means a programming model, but it’s a lot easier to see that async can be done with a single thread and isn’t necessarily parallel, right?

Salgat · 3 years ago

Parallel programming is simply a type of concurrent programming. Concurrent means that two tasks can both progress in a given duration. On a single core computer every thread runs concurrently. Parallel expands on concurrency to mean that two concurrent tasks can also run at the exact same time. On a multi-core computer threads can run in parallel. In many cases concurrent programming and parallel programming have little to no difference, and you program with the assumption that every task can run in parallel (for example, whenever you use async/await or the threadpool).

Watching geohot code a general matrix multiply algorithm from 0.9 GFLOPS and optimising it to 100 glops by only tinkering with cache locality, it makes me wonder how much effort should be put into single threaded performance before ever thinking about multi threading

jetbalsa · 3 years ago

I've seen stuff like that before with a game called Factroio, The only game I've ever see that is optimized so hard that your RAM Speed can affect large bases rather quickly, same with faster L2 Cache. Their entire blog series[1] covers a large part of how they did this. but for a game written mostly in LUA they sure did a good job on it.

1: https://www.factorio.com/blog/post/fff-204

DizzyDoo · 3 years ago

Yes, the Factorio devs had an approach where they optimised everything happening in the game in the original singlethreaded environment, before moving onto multithreaded support. That's where the game is now, and as far as I understand it the multithreading occurs on each independent set of conveyor belts or belt lanes, and there's some info on that in this blog post[0] for anyone interested.

[0] - https://www.factorio.com/blog/post/fff-364

rcxdude · 3 years ago

It makes sense that a simulation game like factorio would be memory bandwidth limited: each tick it needs to update the state of a large number of entities using relatively simple operations. The main trick to making it fast is reducing that amount of data you need to update each tick and arranging the data in memory so you can load it fast (i.e. predictably and sequentially so the prefetcher can do is job) and only need to load and modify it once each tick (at least in terms of loading and eviction from cache). The complexity is in how best to do that, especially both at once. For example, in the blog post they have linked lists for active entities. This makes sense from the point of view of not loading data you don't need to process, but it limits how fast you can load data because you can only prefetch one item ahead (compared to an array where you can prefetch as far ahead as your buffers will allow)

sapiogram · 3 years ago

Note: Since this post, they've multithreaded a fair bit of the simulation as well. It runs insanely well, the entire game is a marvel of software engineering in my book.

bob1029 · 3 years ago

Fintech has mostly determined that 1 thread can get the job done. See LMAX disruptor and related ideas.

What problems exist that generate events or commands faster than 500 million per second? This is potentially the upper bar for 1 thread if you are clever enough.

Latency is the real thing you want to get away from. Adding more than one CPU into the mix screws up the hottest possible path by ~2 orders of magnitude. God forbid you have to wait on the GPU or network. If you have to talk to those targets, it had better be worth the trip.

Retric · 3 years ago

> What problems exist that generate events or commands faster than 500 million per second?

AAA games, Google search, Weather simulation, etc? I mean it depends on what level of granularity you’re talking about, but many problems have a great deal going on under the hood and need to be multi threaded.

crabbone · 3 years ago

This is a wrong view of the problem. Often times your application has to be distributed for reasons other than speed: there are only so many PCIe devices you can connect to a single CPU, there are only so many CPU sockets you can put on a single PCB and so on.

In large systems, parallel / concurrent applications are the baseline. If you have to replicate your data as its being generated into geographically separate location there's no way you can do it in a single thread...

preseinger · 3 years ago

> God forbid you have to wait on the GPU or network. If you have to talk to those targets, it had better be worth the trip.

few programs are CPU-bound, most programs are bottlenecked on I/O waits like these

jason_wo · 3 years ago

As far as I know the LMAX disrupter is a kind of queue/buffer to send data from one thread/task to another.

Typically, some of the tasks run on different cores. The LMAX disruptor is designed such that there is no huge delay due to cache coherency. It is slow to sync the cache of one core to the cache of another core when both cores write to the same address in RAM. The LMAX disruptor is designed that each memory location is (mostly) written to by at most thread/core.

How is the LMAX disrupter relevant for programs with 1 core?

IshKebab · 3 years ago

It's quite rare for problems to be dominated by hot loops in the same way that matrix multiplication is.

Think about something like speeding up a compiler or a web server or a spreadsheet. There's no 50-line function that you can spend a few hours optimising and speed up the whole thing.

That's part of the reason why Python programs (except maths heavy stuff like ML) tend to be so slow despite everyone saying "just write your hot code in C". You can't because there is no hot code.

ModernMech · 3 years ago

This advice dates back to when Python was primarily used by scientists to drive simulations. I remember hearing this advice in the early 2000s as a physics student using vpython to drive N-body simulations. They told us to do the physics in C, but everything else in Python due to the simulation math taking too long in raw Python. We couldn’t make the visualization run at a reasonable frame rate without those optimizations.

These days Python is being used for everything, even things without hot loops as you note. Yet the advice persists.

nmarinov · 3 years ago

Fun video[0]. The optimization bit starts at 0:35. [0]: https://www.youtube.com/watch?v=VgSQ1GOC86s

nradov · 3 years ago

That appears to be one of the major design goals for the Mojo programming language. It allows the developer to code at a high level of abstraction and indicate where parallel execution should be used. Then the execution environment automatically optimizes that at runtime based on the actual hardware available. That hardware may change drastically over the full lifecycle of the application code so in the future it will automatically take advantage of hardware advances such as the introduction of new types of specialized co-processors.

https://www.modular.com/mojo

ascar · 3 years ago

Cache locality is certainly an important aspect and especially when it comes to matrix multiplication even just changing the loop order (and thus access pattern) has a huge performance impact. However, an O(n^3) algorithm will always loose out to an O(n^2*log(n)) algorithm, when the input gets big enough. And the difference between these two might be as simple as sorting your input data first.

I teach parallel programming to graduates and the first exercise we give them is a sequential optimization for exactly that reason. Think about if your algorithm is efficient before thinking about all the challenges that come with parallelization.

nwallin · 3 years ago

> [re:matrix multiplication] However, an O(n^3) algorithm will always loose out to an O(n^2*log(n)) algorithm, when the input gets big enough.

You have to be very careful about what 'big enough' means. In practice, Strassen multiplication is not faster than the naive algorithm until you get to the point where you're multiplying matrices with hundreds of rows/columns. Additionally, naive matrix multiplication is well suited to GPUs, while Strassen multiplication on the GPU requires temporary buffers and multiple jobs and sequencing and whatnot.

As a general rule, matrix multiplication with complexity better than the naive algorithm should probably not be used. Do naive matrix multiplication on the CPU. If you need it to be faster, do naive matrix multiplication on the GPU. If you need it to be faster, the numerical stability of your problem has probably already come a gutser and will get worse if you switch to Strassen or any of the other asymptotically faster algorithms.

And the algorithms faster than Strassen? Forget about it. After Strassen multiplication was invented, about a dozen or so other algorithms came along, slowly reducing that O(n^2.8) to about O(n^2.37188) or so. (most recently in 2022; this is still an area of active research) The problem is that for any of these algorithms to be faster than Strassen, you need matrices that are larger than what you can keep in memory. There is no big enough input that will fit in the RAM of a modern computer. One estimate I've heard is that if you convert every atom in the observable universe into one bit of RAM, and you use that RAM to multiply two 10^38 by 10^38 matrices to get a third 10^38 by 10^38 matrix, you're still better off using the O(n^2.8) Strassen multiplication instead of the state of the art O(n^2.37188) algorithm. The constant slowdown in the other algorithms really are that bad.

anchovy_ · 3 years ago

>However, an O(n^3) algorithm will always loose out to an O(n^2*log(n)) algorithm, when the input gets big enough.

Sure, this might be the case from a theoretical point of view (as per definition) but this completely disregards the hidden constants that come to light when actually implementing an algorithm. There's a reason why for instance a state-of-the-art matrix multiplication algorithm [0] can be completely useless in practice: The input data will never become large enough in order to amortize the introduced overhead.

[0] https://arxiv.org/abs/2210.10173

Hbruz0 · 3 years ago

Link for that ?

clueless · 3 years ago

https://www.youtube.com/watch?v=VgSQ1GOC86s

remontoire · 3 years ago

Const-me · 3 years ago

As a developer, I often choose higher-level APIs not listed in that article.

On Windows, OSX and iOS the OS userland already implements general, and relatively easy to use, thread pools. On Windows, see CreateThreadpoolWork, WaitForThreadpoolWorkCallbacks, etc. It’s easier to use threads with locks while someone else is managing these threads. On Apple, the pool is called “grand central dispatch” and does pretty much the same thing.

Modern Windows kernel supports interesting synchronization APIs like WaitOnAddress, WakeByAddressSingle which allow to implement locks without the complexity or performance overhead of maintaining special synchronization objects.

Linux kernel implements performant and scalable message queues, see mq_overview(7). And it has synchronization objects like eventfd() and pidfd_open() which allow to integrate locks or other things into poll/epoll based event loops.

williamcotton · 3 years ago

GCD/libdispatch is a fantastic approach to concurrency and you can build and install support for non-Apple operating systems:

https://github.com/apple/swift-corelibs-libdispatch

Here’s a simple echo server:

https://github.com/williamcotton/c_playground/blob/master/sr...

Here’s a simple multithreaded database pool:

https://github.com/williamcotton/express-c/blob/master/src/d...

kprotty · 3 years ago

libdispatch idea of using specific queues for serialized concurrency is nice, but its abstractions on top are unfortunately not as efficiently designed as it could be; `dispatch_source` doesn't allow for direct completion based IO schemes (io_uring, IOCP), pushing to a `dispatch_queue` always requires a heap allocation, `dispatch_semaphore/dispatch_sync` blocks the thread instead of yielding asychronously (can cause "thread explosion"). Systems like Go don't have these constraints I don't think.

OnlyMortal · 3 years ago

If you’re in C++ land, you might take a look at Boost ASIO. It’s not just for IPC and would give you portable code.

jb1991 · 3 years ago

And in C++ can also use this dead-simple header file for a nice high-level, modern threadpool using function objects (lambdas) for very easy parallelization of arbitrary tasks: https://github.com/progschj/ThreadPool

codys · 3 years ago

Please don't use posix message queues (`mq_*`, `mq_overview`, etc) when you're writing a program with a single address space (like something that uses threads in a single process).

Posix message queues (the `mq_*` functions) are much slower than optimized shared memory queues using typical atomics and have semantics that are unexpected to most (they persist like files after process termination, because of what they are designed to be used for, they have system level limits on size and size of items, etc).

A simple benchmark vs rust's `std::sync::mpsc` queue shows `std::sync::mpsc` is 28.6 times faster when using 1 producer and 1 consumer, and is 37.32 times faster with 2 producers and 1 consumer.

vmfunction · 3 years ago

what about "green threads" that is not managed by the OS like https://tokio.rs ?

flaghacker · 3 years ago

Tokio is using the Rust async features, which are not green threads. In the former code has to explicitly mark potential yield points, in the latter green threads can be scheduled by the runtime without any help from the code itself.

As a historical note, Rust used to have green threads but they were abandoned a long time ago. This is a good talk about both the differences between different forms of concurrency/async and Rusts history with them: https://www.infoq.com/presentations/rust-2019/ (includes a transcript)

I believe these green threads work well when there’s good support in both language, runtime and standard library. I have built complicated concurrent software in C# with async-await. I don’t program golang but I heard the concurrency model works rather well in golang too, probably for the same reason as C#: good support in the language and the runtime.

I have no idea about Tokyo. I don’t program Rust, and the feedback I read about async/await was mixed.

Deleted Comment

kllrnohj · 3 years ago

You only ever need this if you're trying to have hundreds of thousands to millions of threads. It's a very niche problem to have

aleph_minus_one · 3 years ago

Green threads do not make use of multiple cores of a modern processor.

uguuo_o · 3 years ago

On scientific computing (computational fluid dynamics, computational electromagnetics, etc.), parallel programing is a must. Most of the algorithms are not embarrassingly parallel, and need a significant amount of communication between threads during runtime. We mostly use one of the many MPI [0] libraries available for desktop and high-performance computing machines. Using these programming paradigms is difficult but tend to result in fantastic scalability for all types of engineering problems.

[0] https://en.m.wikipedia.org/wiki/Message_Passing_Interface

OscarCunningham · 3 years ago

So this might be a naive question, but is it literally that you're simulating a fluid in parallel by having each thread simulate a portion of the space? And then the message passing between threads is the data about the fluid on the boundary?

Depending on the method, but generally yes, the physical domain is decomposed into the number of cores. Data is synchronized at processor boundaries at a very high rate to maintain consistency.

elil17 · 3 years ago

Computational fluid dynamics is actually a problem where parallelization doesn't get you much. It is primarily limited by memory bandwidth.

klabb3 · 3 years ago

I’m way-above-average interested in concurrent programming but this 600+ page brick will probably remain on my reading list until I am stranded on a deserted island. Did anyone here read the whole thing? Can one make a reasonable summary or is this more of a lexicon of different techniques?

tambourine_man · 3 years ago

Stranded on a deserted island is bit radical, but I recommend going to a place with no internet connection for a week or so. I though I had a long reading problem. Turns out, I have an internet problem.

sn9 · 3 years ago

The circuit in my apartment that my wifi is on keeps flipping so my connection has been frustratingly intermittent.

I've gotten more reading and cleaning done in the last week than I have in maybe years.

tomrod · 3 years ago

I didn't realize I also was a tambourine_man -- you literally just described my recent realization.

I'm moving this week and won't have internet for a few days. I look forward to the relative break.

ponderings · 3 years ago

some 40 years ago in the Netherlands we had no TV broadcast at night because people should be in bed.

I wonder if an ISP could forge a product like noon till 7 (or even 4am till 9am) when few people use bandwidth. Perhaps combined with a very slow connection the rest of the day.

> concurrent programming

If you're interested in concurrent programming this book won't give you much. The topics focus on the parallelization of a sequential algorithm and go into detail about things like synchronization (locking, barriers), important HW details (like caches and CPU pipelining) and algorithmic approaches. Imagine a weather simulation and not concurrent requests to a web server.

Good to know! Yes I’m less interested in “the art of programming but parallel” (partly because I don’t have PhD level ambitions) and more interested in how to effectively program day-to-day boring stuff that also needs concurrency.

My assessment is that the latter is nowhere near “solved” (hesitant to use that word because craft unlike eg proofs is about trade offs), in the sense that concurrency differs a lot across our tools (languages, runtimes etc) and it even differs quite substantially within the main paradigms, like coroutines, async, green threads, native threading etc.

If we compare with other advanced language features like say memory management, we’ve come much further, imo (GC, RAII, ref counting, manual are pretty much all well understood as well as the stack-heap duality, adopted basically universally). With concurrency we have, if we’re being generous, merely mutices as the common ground. Even “simple” notification mechanisms and ownership transfer across tasks would vary greatly across languages and often be quite contrived.

genrilz · 3 years ago

I've read an earlier edition. I would say it is more of a overview of the different concerns and design patterns that someone doing parallel programming would find useful. For instance, it gives you an overview of how CPU caches work, and uses that to motivate why doing synchronization is expensive. It uses this to both motivate doing as little synchronization as possible, and giving you ways to design systems which don't need (as much) synchronization, as well as to explain why RCU works so well as long as you don't need to update the synchronized structure often.

If you want a good overview, I'd recommend reading the roadmap in section 1.1. Also, I know a 600 page book (400 pages if you exclude appendixes) is a bit long, but I really enjoyed both the material presented and the style in which it was written. Hopefully that makes the length feel a bit less intimidating.

hospitalJail · 3 years ago

Check out the table of contents.

I did a quick flip through and realized that I'm never going to be doing low level multithreading, so I don't need to deal with OS layer stuff. There were some other ideas too, might be worth flipping to relevant areas.

Heck, if I do end up using it, I'll likely read a summary when I'm dealing with it.

There are some ideas that I havent heard of, which was alright, but again, since my current language(python) handles it and I am used to doing multithreading using those libraries, I don't get a ton of value out of reading the fundamentals. (opportunity cost)

ChrisMarshallNY · 3 years ago

I do a lot of it, but never on the low-end.

99% of mine, is responding to closures for network events and device responses.

When you do that, synchronizing is one way to deal with things (wait for the other thing to finish), or completion testing (is the thing ready for the next step?). Basically, they are the same thing.

You are also not always guaranteed a standard context, but modern languages make it easy to hook to one. In the "old days," we used to use RefCons (Reference Context hooks).

KeplerBoy · 3 years ago

There's no need to read the whole thing. Read a chapter you're interested in and go back to the book once you're interested in another chapter. It's not a novel.

johnnyanmac · 3 years ago

the PDF I opened was almost 1000 pages. And yes, from what I'm viewing in the Table of Contents, this is more of a comprehensive textbook covering almost every corner of the topic. I don't think it's meant to be read linearly (ha), nor is it expected to be binged in a few large sittings. Definitely meant for professionals or very motivated students more than a general hobbyist.

jbergens · 3 years ago

If one thousand of us read one page each we can be done in 15 minutes!

Then we just have to sync our knowledge.

Tepix · 3 years ago

I was going to read it, but then i played a round of Qwitzatteracht, the golf game, instead.

giovannibonetti · 3 years ago

Functional programming can be a great way to handle parallel programming in a sane way. See the Futhark language [1], for example, that accepts high-level constructs like map and reduce, then converts them to the appropriate machine code, either on the CPU or the GPU.

[1] https://futhark-lang.org/

davidgrenier · 3 years ago

This looks brutal, for a lot of people John Reppy's book Concurrent Programming in ML (as in SML not Machine Learning) is going to be much more accessible. Pick the CSP-style library in the programming language of your choice.

Go with goroutines and channels

Clojure with core.async

F# with Hopac

It would be a very interesting project to roll your own in C# using Microsoft Robotics Studio's CCR (Coordination and Concurrency Runtime) (though I speculate those are buffered channels by default).

PheonixPharts · 3 years ago

There are multiple replies like this one, but it's a bit shocking to see that on Hacker News people don't know the difference between concurrent programming and parallel programming.

Concurrency means that you can have multiple tasks running in the same time period, Parallelism means you have multiple tasks running at the same time.

The most obvious demonstration of this is that you can (and many languages do) have single threaded concurrency.

I did some grad course work in parallel programming and there's really no way to not make it "brutal", because to really do it in a way that increases performance you need to really understand some low-level performance issues.

cpach · 3 years ago

Correct me if I’m wrong, but isn’t concurrency enough for “most” use-cases? When does one really need true parallelism?

Paradigma11 · 3 years ago

https://github.com/Hopac/Hopac is such an impressive piece of software. Too bad it never really took off like it deserved but with more popular competition like rx or just tasks/async (which is enough for most stuff) pretty unavoidable.

kubanczyk · 3 years ago

Etmology of ML: "Meta Language"

cmrdporcupine · 3 years ago

Concurrent state management is hard, and I remain disappointed that software transactional memory -- and pushing a DB-style approach to non-DB workloads generally -- hasn't caught on.

I think most "application" type programs would benefit from being able to manage state in terms of higher level transactional operations and let their runtime take care of serializing and avoiding deadlock and race conditions. Developers in those spaces have come to rely on garbage collection in their runtimes, I see no reason why they couldn't come to also rely on a fully isolated MVCC transaction model (and only get access to non-transactional memory in exceptional circumstances). Bonus points if said transactions tie back to the persistent RDBMS transaction as well.

There are too many minefields in manual lock management and the tools remain fairly low level. Ownership management in languages like Rust helps, following a discipline like an actor/CSP approach helps, etc. but in the end if there's shared state and there's parallel work, there's potential for trouble.

twoodfin · 3 years ago

You might be interested in Joe Duffy’s retrospective on Microsoft’s failed experiment to integrate STM into .NET:

https://joeduffyblog.com/2010/01/03/a-brief-retrospective-on...

mrkeen · 3 years ago

You really only need one paragraph from that:

> What do we do with atomic blocks that do not simply consist of pure memory reads and writes? (In other words, the majority of blocks of code written today.)

If you could just get programmers to stop mutating, you could get STM (which incidentally would give you back the ability to mutate)

Thanks, skimmed a bit of the first half but will get into it more after my work day.

It's a good day when I get to use STM.

While the primary benefit is in thinking "these things should happen together or not at all" rather than thinking about locks, there's another feature I always forget about, which is retrying.

Retrying let's you 'nope' out of a transaction, and try again as soon as something changes without busy-waiting.