Regent: A Language for Implicit Dataflow Parallelism

This idea seems to be pretty similar to TensorFlow. Could anybody familiar to this elaborate the difference?

The main difference between Legion and TensorFlow is how and when the dataflow graph is constructed. In TensorFlow the graph is constructed lazily (no execution is performed until you've asked for it), it's optimized, and then distributed to processors (GPUs/TPUs) for execution. In Legion, the graph is built, distributed, and executed on the fly. What this means is that Legion can react to things like dynamic control flow (e.g. branches inside of loops) and analyze dependences at runtime to find task parallelism, in a similar way to how your out-of-order CPU extracts instruction level parallelism from a program. Doing things in the TensorFlow model works better when you can see your whole program up front and can "statically" optimize and schedule it because it has lower overheads, but it also has limits to the kinds of programs it can handle; the Legion approach works better when you have dynamic data-dependent behavior in your program and you need to react to it on the fly, but it does have some overhead to the dynamic analysis.

CBLT · 6 years ago

Check out https://news.ycombinator.com/item?id=20161902

TL;DR is TF executor parallelism is too pessimistic to fully exploit the parallelization opportunities in the problem space. Regent is built on top of Legion which is a cutting edge dataflow library, and is designed to provide the guarantees to achieve that speedup.

I don't understand why something like this would need a separate language. Switching languages means starting over in many ways with regards to tools libraries and semantics. A graph of tasks can be made with a cdecl library.

chrisseaton · 6 years ago

Have you read Boehm's paper 'Threads Cannot be Implemented as a Library'?

It's the same reason.

Your dataflow semantics need to be part of the language semantics, otherwise they're bound to be loosely defined and even more loosely enforced.

BubRoss · 6 years ago

That's an assertion, but not anything to back it up.

First, threads have been implemented as libraries many times. Second, if checks need to happen theg can happen at debug run time if they can't happen at compile time. I don't know what specifically has to be integrated into a language here that makes throwing away the enormous amount already built in other languages.

_bxg1 · 6 years ago

An often-overlooked quality of any new language is the set of things that you can't do in it. Some features can only be accomplished when certain negative guarantees can be made about programs. And it's really hard to implement negative guarantees as a library.

lightsighter · 6 years ago

It's worth noting that Regent is the language that implements the Legion programming model. The Legion runtime system is just a C++ library with bindings for C, Fortran, Python, Terra, and Lua. Writing Regent code is much higher productivity than writing to the C++ Legion library directly, but if you want to you can drop down and write your tasks in any of the other languages that Legion supports. You can even mix and match tasks written in different languages.

mratsim · 6 years ago

Hey, I've been following up on Legion and Regent quite a bit, excellent work there.

Do you have a set of benchmarks that others can reimplement to compare the approaches?

I've added Dataflow Parallelism to my own multithreading runtime[1] but I didn't had dataflow focused benchmarks yet, well I could add Cholesky Decomposition but it's quite involved.

I expect the people from TaskFlow[2] and Habanero[3] (via Data-Driven Futures) would be quite interested as well in a common set of dataflow parallelism benchmarks.

By the way if you didn't read the DaCe paper[4] you absolutely should, seems like the age of Dataflow parallelism and properly optimizing for data is coming.

[1]: https://github.com/mratsim/weave#dataflow-parallelism

[2]: https://github.com/taskflow/taskflow

[3]: https://github.com/habanero-rice/hclib

[4]: https://github.com/spcl/dace, https://arxiv.org/abs/1902.10345

Please correct me if I'm wrong, but I think all of those system only work inside of a single process. Legion/Regent support distributed multi-node multi-process execution both on supercomputers and in the cloud.

brudgers · 6 years ago

some past comments, https://news.ycombinator.com/item?id=10764268

smabie · 6 years ago

I've always wondered why there isn't a general purpose purely functional language that computes a graph of deps and implicitly parallelizes all operations. For some things, like a map over a list, I understand that the overhead of distributing the work is greater than just using one thread, but for things known at compile time (like deps between variables), the cost should be zero to distribute.

Prior to working on Legion, I worked on a programming language called Sequoia that matches much of what you are describing [1]. In many ways Sequoia was the spiritual ancestor of Legion/Regent.

[1]: http://theory.stanford.edu/~aiken/publications/papers/sc06.p...

chenzhekl · 6 years ago