In that case, each line is processed sequentially, with a complete array being created between each step. Nothing actually gets pipelined.
Despite being clean and readable, I don't tend to do it any more, because it's harder to debug. More often these days, I write things like this:
data = File.readlines("haystack.txt")
data = data.map(&:strip)
data = data.grep(/needle/)
data = data.map { |i| i.gsub('foo', 'bar') }
data = data.map { |i| File.readlines(i).count }
It's ugly, but you know what? I can set a breakpoint anywhere and inspect the intermediate states without having to edit the script in prod. Sometimes ugly and boring is better.
> The author keeps calling it "pipelining", but I think the right term is "method chaining". [...] You get a similar effect with coroutines.
The inventor of the shell pipeline, Douglas McIlroy, always understood the equivalency between pipelines and coroutines; it was deliberate. See https://www.cs.dartmouth.edu/~doug/sieve/sieve.pdf It goes even deeper than it appears, too. The way pipes were originally implemented in the Unix kernel was when the pipe buffer was filled[1] by the writer the kernel continued execution directly in the blocked reader process without bouncing through the scheduler. Effectively, arguably literally, coroutines; one process call the write function and execution continues with a read call returning the data.
Interestingly, Solaris Doors operate the same way by design--no bouncing through the scheduler--unlike pipes today where long ago I think most Unix kernels moved away from direct execution switching to better support multiple readers, etc.
[1] Or even on the first write? I'd have to double-check the source again.
I don’t find your “seasoned developer” version ugly at all. It just looks more mature and relaxed. It also has the benefits that you can actually do error handling and have space to add comments.
Maybe people don’t like it because of the repetition of “data =“ but in fact you could use descriptive new variable names making the code even more readable (auto documenting).
I’ve always felt method chaining to look “cramped”, if that’s the right word. Like a person drawing on paper but only using the upper left corner. However, this surely is also a matter of preference or what your used to.
is a hell to read and understand later imo. You have to read a lot of intermediate variables that do not matter in anything else in the code after you set it up, but you do not know in advance necessarily which matter and which don't unless you read and understand all of it. Also, it pollutes your workspace with too much stuff, so while this makes it easier to debug, it makes it also harder to read some time after. Moreover becomes even more crumpy if you need to repeat code. You probably need to define a function block then, which moves the crumpiness there.
What I do now is starting defining the transformation in each step as a pure function, and chain them after once everything works, plus enclosing it into an error handler so that I depend on breakpoint debugging less.
There is certainly a trade off, but as a codebase grows larger and deals with more cases where the same code needs to be applied, the benefits of a concise yet expressive notation shows.
Code in this "named-pipeline" style is already self-documenting: using the same variable name makes it clear that we are dealing with a pipeline/chain. Using more descriptive names for the intermediate steps hides this, making each line more readable (and even then you're likely to end up with `dataStripped = data.map(&:strip)`) at the cost of making the block as a whole less readable.
> Maybe people don’t like it because of the repetition of “data =“
Eh, at first glance it looks "amateurish" due to all the repeated stuff. Chaining explicitly eliminates redundant operations - a more minimal representation of data flow - so it looks more "professional". But I also know better than to act on that impulse. ;)
That said, it really depends on the language at play. Some will compile all the repetition of `data =` away such that the variable's memory isn't re-written until after the last operation in that list; it'll hang out in a register or on the stack somewhere. Others will run the code exactly as written, bouncing data between the heap, stack, and registers - inefficiencies and all.
IMO, a comment like "We wind up debugging this a lot, please keep this syntax" would go a long way to help the next engineer. Assuming that the actual processing dwarfs the overhead present in this section, it would be even better to add discrete exception handling and post-conditions to make it more robust.
In most debuggers I have used, if you put a breakpoint on the first line of the method chain, you can "step over" each function in the chain until you get to the one you want.
Bit annoying, but serviceable. Though there's nothing wrong with your approach either.
> The author keeps calling it "pipelining", but I think the right term is "method chaining".
Allow me, too, to disagree. I think the right term is "function composition".
Instead of writing
h(g(f(x)))
as a way to say "first apply f to x, after which g is applied to the result of this, after which h is applied to the result of this", we can use function composition to compose f, g and h, and then "stuff" the value x into this "pipeline of composed functions".
We can use whatever syntax we want for that, but I like Elm syntax which would look like:
I do the same with Python, replacing multilevel comprehensions with intermediary steps of generator expressions, which are lazy and therefore do not impact performance and memory usage.
Ultimately it will depend on the functions being chained. If they can work with one part of the result, or a subset of parts, then they might not block, otherwise they will still need to get a complete result and the lazy cannot help.
I think the best term is "function composition", but with a particular syntax so pipelining seems alright. Method chaining is a common case, where some base object is repeatedly modified by some action and then the object reference is returned by the "method", thus allowing the "chaining", but what if you're not dealing with objects and methods? The pipelined composition pattern is more general than method chaining imho.
You make an interesting point about debugging which is something I have also encountered in practice. There is an interesting tension here which I am unsure about how to best resolve.
In PRQL we use the pipelining approach by using the output of the last step as the implicit last argument of the next step. In M Lang (MS Power BI/Power Query), which is quite similar in many ways, they use second approach in that each step has to be named. This is very useful for debugging as you point out but also a lot more verbose and can be tedious. I like both but prefer the ergonomics of PRQL for interactive work.
Update: Actually, PRQL has a decent answer to this. Say you have a query like:
from invoices
filter total > 1_000
derive invoice_age = @2025-04-23 - invoice_date
filter invoice_age > 3months
and you want to figure out why the result set is empty. You can pipe the results into an intermediate reference like so:
from invoices
filter total > 1_000
into tmp
from tmp
derive invoice_age = @2025-04-23 - invoice_date
filter invoice_age > 3months
So, good ergonomics on the happy path and a simple enough workaround when you need it. You can try these out in the PRQL Playground btw: https://prql-lang.org/playground/
> The author keeps calling it "pipelining", but I think the right term is "method chaining".
I believe the correct definition for this concept is the Thrush combinator[0]. In some ML-based languages[1], such as F#, the |> operator is defined[2] for same:
[1..10] |> List.map (fun i -> i + 1)
Other functional languages have libraries which also provide this operator, such as the Scala Mouse[3] project.
I'm not sure that's right, method chaining is just immediately acting on the return of the previous function, directly. It doesn't pass the return into the next function like a pipeline. The method must exist on the returned object. That is different to pipelines or thrush operators. Evaluation happens in the order it is written.
Unless I misunderstood the author, because method chaining is super common where I feel thrush operators are pretty rare, I would be surprised if they meant the latter.
Shouldn’t modern debuggers be able to handle that easily? You can step in, step out, until you get where you want, or you could set a breakpoint in the method you want to debug instead of at the call site.
> Despite being clean and readable, I don't tend to do it any more, because it's harder to debug. More often these days, I write things like this:
data = File.readlines("haystack.txt")
data = data.map(&:strip)
data = data.grep(/needle/)
data = data.map { |i| i.gsub('foo', 'bar') }
data = data.map { |i| File.readlines(i).count }
Hard disagree. It's less readable, the intend is unclear (where does it end?), and the variables are rewritten on every step and everything is named "data" (and please don't call them data_1, data_2, ...) so now you have to run a debugger to figure out what even is going on, rather than just... reading the code.
The person you are quoting already conceded that is less readable, but that the ability to set a breakpoint easily (without having to stop the process and modify the code) is more important.
I myself agree, and find myself doing that too, especially in frontend code that executes in a browser. Debuggability is much more important than marginally-better readability, for production code.
> Each of those components executes in parallel, with the intermediate results streaming between them. You get a similar effect with coroutines.
Processes run in parallel, but they process the data in a strict sequential order: «grep» must produce a chunk of data before «sed» can proceed, and «sed» must produce another chunk of data before «xargs» can do its part. «xargs» in no way can ever pick up the output of «grep» and bypass the «sed» step. If the preceding step is busy crunching the data and is not producing the data, the subsequent step will be blocked (the process will fall asleep). So it is both, a pipeline and a chain.
It is actually a directed data flow graph.
Also, if you replace «haystack.txt» with a /dev/haystack, i.e.
Isn't the difference between a pipeline and a method chain that a pipeline doesn't have to wait for the previous process to complete in order to send results to the next step? Grep sends lines as it finds them to sed and sed on to xargs, which acts as a sink to collect the data (an is necessary otherwise wc -l would write out a series of ones).
Given File.readlines("haystack.txt"), the entire file must be resident in memory before .grep(/needle/) is performed, which may cause unnecessary utilization. Iirc, in frameworks like Polars, the collect() chain ending method tells the compiler that the previous methods will be performed as a stream and thus not require pulling the entirety into memory in order to perform an operation on a subset of the corpus.
I have to object against reusing the 'data' var. Make up a new name for each assignment in particular when types and data structures change (like the last step is switching from strings to ints).
I agree with this comment: https://news.ycombinator.com/item?id=43759814 that this pollutes current scope, which is especially bad if scoping is not that narrow (the case in Python where if-branches do not define their own scope, I don´t know for Ruby).
Another problem of having different names for each step is that you can no longer quickly comment out a single step to try things out, which you can if you either have the pipeline or a single variable name.
In Python, such steps like map() and filter() would execute concurrently, without large intermediate arrays. It lacks the chaining syntax for them, too.
Java streams are the closest equivalent, both by the concurrent execution model, and syntactically. And yes, the Java debugger can show you the state of the intermediate streams.
if you work with I/O, when you can have all sorts of wrong/invalid data and I/O errors, the chaining is a nightmare, as each chain can have numerous different errors/exceptions.
the chaining really only works if your language is strongly typed and you are somewhat guaranteed that variables will be of expected type.
Syntactic sugar can sometimes fool us into thinking the underlying process is more efficient or streamlined. As a new programmer, I probably would have assumed that "storing" `data` at each step would be more expensive.
It absolutely becomes very inefficient, though the threshold data set size varies according to context. Most languages don't have lightweight coroutines as an alternative (but see Lua!), so the convenient alternatives have larger fixed cost. Plus cache locality means cache utilization might be helpful, or even better, as opposed to switching back-and-for every data element, though coroutine-based approaches can also use buffering strategies, which not coincidentally is how pipes work.
But, yes, naive call chaining like that is sometimes a significant performance problem in the real world. For example, in the land of JavaScript. One of the more egregious examples I've personally seen was a Bash script that used Bash arrays rather than pipelines, though in that case it had to do with the loss of concurrency, not data churn.
For my Ruby example, each of those method calls will allocate an Array on the heap, where it will persist until all references are removed and the GC runs again. The extra overhead of the named reference is somewhere between Tiny and Zero, depending on your interpreter. No extra copies are made; it's just a reference.
In most compiled languages: the overhead is exactly zero. At runtime, nothing even knows it's called "data" unless you have debug symbols.
If these are going to be large arrays and you actually care about memory usage, you wouldn't write the code the way I did. You might use lazy enumerators, or just flatten it out into a simple procedure; either of those would process one line at a time, discarding all the intermediate results as it goes.
Also, "File.readlines(i).count" is an atrocity of wasted memory. If you care about efficiency at all, that's the first part to go. :)
We might be able to cross one more language off your wishlist soon, Javascript is on the way to getting a pipeline operator, the proposal is currently at Stage 2
It also has barely seen any activity in years. It is going nowhere. The TC39 committee is utterly dysfunctional and anti-progress, and will not let any this or any other new syntax into JavaScript. Records and tuples has just been killed, despite being cited in surveys as a major missing feature[1]. Pattern matching is stuck in stage 1 and hasn't been presented since 2022. Ditto for type annotations and a million other things.
Our only hope is if TypeScript finally gives up on the broken TC39 process and starts to implement its own syntax enhancements again.
I was excited for that proposal, but it veered off course some years ago – some TC39 members have stuck to the position that without member property support or async/await support, they will not let the feature move forward.
It seems like most people are just asking for the simple function piping everyone expects from the |> syntax, but that doesn't look likely to happen.
I worry about "soon" here. I've been excited for this proposal for years now (8 maybe? I forget), and I'm not sure it'll ever actually get traction at this point.
A while ago, I wondered how close you could get to a pipeline operator using existing JavaScript features. In case anyone might like to have a look, I wrote a proof-of-concept function called "Chute" [1]. It chains function and method calls in a dot-notation style like the basic example below.
chute(7) // setup a chute and give it a seed value
.toString // call methods of the current data (parens optional)
.parseInt // send the current data through global native Fns
.do(x=>[x]) // through a chain of one or more local / inline Fns
.JSON.stringify // through nested global functions (native / custom)
.JSON.parse
.do(x=>x[0])
.log // through built in Chute methods
.add_one // global custom Fns (e.g. const add_one=x=>x+1)
() // end a chute with '()' and get the result
All of their examples are wordier than just function chaining and I worry they’ve lost the plot somewhere.
They list this as a con of F# (also Elixir) pipes:
value |> x=> x.foo()
The insistence on an arrow function is pure hallucination
value |> x.foo()
Should be perfectly achievable as it is in these other languages. What’s more, doing so removes all of the handwringing about await. And I’m frankly at a loss why you would want to put yield in the middle of one of these chains instead of after.
Even more concise and it doesn't even require a special language feature, it's just regular syntax of the language ( |> is a method like .get(...) so you could even write `params.get("user").|>(create_user) if you wanted to)
In elixir, ```Map.get("user") |> create_user |> notify_admin ``` would aso be valid, standard elixir, just not idiomatic (parens are optional, but preferred in most cases, and one-line pipes are also frowned upon except for scripting).
I've been using Elxir for a long time and had that same hope after having experienced how clear, concise and maintainable apps can be when the core is all a bunch of pipelines (and the boundary does error handling using cases and withs). But having seen the pipe operator in Ruby, I now think it was a bad idea.
The problem is that method-chaining is common in several OO languages, including Ruby. This means the functions on an object return an object, which can then call other functions on itself. In contrast, the pipe operator calls a function, passing in what's on the left side of it as the first argument. In order to work properly, this means you'll need functions that take the data as the first argument and return the same shape to return, whether that's a list, a map, a string or a struct, etc.
When you add a pipe operator to an OO language where method-chaining is common, you'll start getting two different types of APIs and it ends up messier than if you'd just stuck with chaining method calls. I much prefer passing immutable data into a pipeline of functions as Elixir does it, but I'd pick method chaining over a mix of method chaining and pipelines.
I'm a big fan of the Elixir operator, and it should be standard in all functional programming languages. You need it because everything is just a function and you can't do anything like method chaining, because none of the return values have anything like methods. The |> is "just" syntax sugar for a load of nested functions. Whereas the Rust style method chaining doesn't need language support - it's more of a programming style.
Note also that it works well in Elixir because it was created at the same time as most of the standard library. That means that the standard library takes the relevant argument in the first position all the time. Very rarely do you need to pipe into the second argument (and you need a lambda or convenience function to make that work).
Agree. This is absolutely my fave part of Elixir. Whenever I can get something to flow elegantly thru a pipeline like that, I feel like it’s a win against chaos.
Yes, a small feature set is important, and adding the functional-style pipe to languages that already have chaining with the dot seems to clutter up the design space. However, dot-chaining has the severe limitation that you can only pass to the first or "this" argument.
Is there any language with a single feature that gives the best of both worlds?
The pipe operator relies on the first argument being the subject of the operation. A lot of languages have the arguments in a different order, and OO languages sometimes use function chaining to get a similar result.
IIRC the usual workaround in Elixir involves be small lambda that rearranges things:
"World"
|> then(&concat("Hello ", &1))
I imagine a shorter syntax could someday be possible, where some special placeholder expression could be used, ex:
"World"
|> concat("Hello ", &1)
However that creates a new problem: If the implicit-first-argument form is still permitted (foo() instead of foo(&1)) then it becomes confusing which function-arity is being called. A human could easily fail to notice the absence or presence of the special placeholder on some lines, and invoke the wrong thing.
I disagree, because then it can be very ambiguous with an existing `|` operator. The language has to be able to tell that this is a pipeline and not doing a bitwise or operation on the output of multiple functions.
Lisp macros allow a general solution to this that doesn't just handle chained collection operators but allows you to decide the order in which you write any chain of calls.
For example, we can write:
(foo (bar (baz x))) as
(-> x baz bar foo)
If there are additional arguments, we can accommodate those too:
(sin (* x pi) as
(-> x (* pi) sin)
Where expression so far gets inserted as the first argument to any form. If you want it inserted as the last argument, you can use ->> instead:
(filter positive? (map sin x)) as
(->> x (map sin) (filter positive?))
You can also get full control of where to place the previous expression using as->.
I find the threading operators in Clojure bring much joy and increase readability. I think it's interesting because it makes me actually consider function argument order much more because I want to increase opportunities to use them.
Yeah, I found this when I was playing around with Hy a while back. I wanted a generic `->` style operator, and isn't wasn't too much trouble to write a macro to introduce one.
That's sort of an argument for the existence of macros as a whole, you can't really do this as neatly in something like python (although I've tried) - I can see the downside of working in a codebase with hundreds of these kind of custom language features though.
Yes threading macros are so much nicer than method chaining, because it allows general function reuse, rather than being limited to the methods that happen to be defined in your initial data object.
A pipeline operator is just partial application with less power. You should be able to bind any number of arguments to any places in order to create a new function and "pipe" its output(s) to any other number of functions.
One day, we'll (re)discover that partial application is actually incredibly useful for writing programs and (non-Haskell) languages will start with it as the primitive for composing programs instead of finding out that it would be nice later, and bolting on a restricted subset of the feature.
I like partial application like in Standard ML, but it also means, that one must be very careful with the order of arguments, unless we get a variant of partial application, that is flexible enough to let you specify which arguments you want to provide, instead of always assuming the first n arguments. I use "cut" for this in Scheme. Threading/Pipelines are still very useful though and can shorten things and make them very readable.
Sure. But how do you write that in a way that is expressive, terse, and readable all at once? Nothing beats x | y | z or (-> x y z). The speed of both writing and reading (and comprehending), the sheer simplicity, is what makes pipelining useful in the first place.
I've always found magrittr mildly hilarious. R has vestigial Lisp DNA, but somehow the R implementation of pipes was incredibly long, complex and produced stack traces, so it moved to a native C implementation, which nevertheless has to manipulate the SEXPs that secretly underlie the language. Compared to something like Clojure's threading macros it's wild how much work is needed.
R, specifically tidyverse, has a special place in my heart. Tidy principles makes data analysis easy to read and easy to use new functions, since there are standards that must be met to call a function "tidy."
Recently I started using Nushell, which feels very similar.
R + tidyverse is the gold standard for working with data quickly in a readable and maintainable way, IMO. It's just absolutely seamless. Shoutout to tidyverts (https://tidyverts.org/) for working with time series, too
Pipelining looks nice until you have to debug it. And exception handling is also very difficult, because that means to add forks into your pipelines. Pipelines are only good for programming the happy path.
At the risk of over generalized pronouncements, ease of debugging is usually down to how well-designed your tooling happens to be. Most of the time the framework/language does that for you, but it's not the only option.
And for exceptions, why not solve it in the data model, and reify failures? Push it further downstream, let your pipeline's nodes handle "monadic" result values.
Point being, it's always a tradeoff, but you can usually lessen the pain more than you think.
And that's without mentioning that a lot of "pipelining" is pure sugar over the same code we're already writing.
Pipelining simplifies debugging. Each step is obvious and it is trivial to insert logging between pipeline elements. It is easier to debug than the patterns compared in the article.
Exception handing is only a problem in languages that use exceptions. Fortunately there are many modern alternatives in wide use that don't use exceptions.
This is my experience too - when the errors are encoded into the type system, this becomes easier to reason about (which is much of the work when you’re debugging).
I've encountered and used this pattern in Python, Ruby, Haskell, Rust, C#, and maybe some other languages. It often feels nice to write, but reading can easily become difficult -- especially in Haskell where obscure operators can contain a lot of magic.
Debugging them interactively can be equally problematic, depending on the tooling. I'd argue, it's commonly harder to debug a pipeline than the equivalent imperative code and, that in the best case it's equally hard.
I don't know what you're writing, but this sounds like language smell. If you can represent errors as data instead of exceptions (Either, Result, etc) then it is easy to see what went wrong, and offer fallback states in response to errors.
Programming should be focused on the happy path. Much of the syntax in primitive languages concerning exceptions and other early returns is pure noise.
Established debugging tools and logging rubric are not suitable for debugging heavily pipelined code. Stack traces, debuggers rely heavily on line based references which are less useful in this style and can make diagnostic practices feel a little clumsy.
The old adage of not writing code so smart you can’t debug it applies here.
Pipelining runs contrary enough to standard imperative patterns. You don’t just need a new mindset to write code this way. You need to think differently about how you structure your code overall and you need different tools.
That’s not to say that doing things a different way isn’t great, but it does come with baggage that you need to be in a position to carry.
Pipelining is just syntactic sugar for nested function calls.
If you need to handle an unhappy path in a way that isn’t optimal for nested function calls then you shouldn’t be nesting your function calls. Pipelining doesn’t magically make things easier nor harder in that regard.
But if a particular sequence of function calls do suit nesting, then pipelining makes the code much more readable because you’re not mixing right-to-left syntax (function nests) with left-to-right syntax (ie you’re typical language syntax).
Depends on the context - in a scripting language where you have some kind of console you just don't copy all lines, and see what each pipe does one after another. This is pretty straight forward.
(Not talking about compiled code though)
While the author claims "semantics beat syntax every day of the week," the entire article focuses on syntax preferences rather than semantic differences.
Pipelining can become hard to debug when chains get very long. The author doesn't address how hard it can be to identify which step in a long chain caused an error.
They do make fun of Python, however. But don't say much about why they don't like it other than showing a low-res photo of a rock with a pipe routed around it.
Ambiguity about what constitutes "pipelining" is the real issue here. The definition keeps shifting throughout the article. Is it method chaining? Operator overloading? First-class functions? The author uses examples that function very differently.
> Pipelining can become hard to debug when chains get very long. The author doesn't address how hard it can be to identify which step in a long chain caused an error.
Yeah, I agree that this can be problem when you lean heavily into monadic handling (i.e. you have fallible operations and then pipe the error or null all the way through, losing the information of where it came from).
But that doesn't have much to do with the article: You have the same problem with non-pipelined functional code. (And in either case, I think that it's not that big of a problem in practice.)
> The author uses examples that function very differently.
Yeah, this is addressed in one of the later sections. Imo, having a unified word for such a convenience feature (no matter how it's implemented) is better than thinking of these features as completely separate.
Yes, but here's my hot take - what if you didn't have to edit the source code to debug it? Instead of chaining method calls you just assign to a temporary variable. Then you can set breakpoints and inspect variable values like you do normally without editing source.
It's not like you lose that much readability from
foo(bar(baz(c)))
c |> baz |> bar |> foo
c.baz().bar().foo()
t = c.baz()
t = t.bar()
t = t.foo()
I think you may have misinterpreted his motive here.
Just before that statement, he says that it is an article/hot take about syntax. He acknowledges your point.
So I think when he says "semantics beat syntax every day of the week", that's him acknowledging that while he prefers certain syntax, it may not be the best for a given situation.
the paragraph you quoted (atm, 7 mins ago, did it change?) says:
>Let me make it very clear: This is [not an] article it's a hot take about syntax. In practice, semantics beat syntax every day of the week. In other words, don’t take it too seriously.
Having both options is great (at the beginning effect had only pipe-based pipelines), after years of writing effect I'm convinced that most of the time you'd rather write and read imperative code than pipelines which definitely have their place in code bases.
In fact most of the community, at large, converged at using imperative-style generators over pipelines and having onboarded many devs and having seen many long-time pipeliners converging to classical imperative control flow seems to confirm both debugging and maintenance seem easier.
Compare with a simple pipeline in bash:
Each of those components executes in parallel, with the intermediate results streaming between them. You get a similar effect with coroutines.Compare Ruby:
In that case, each line is processed sequentially, with a complete array being created between each step. Nothing actually gets pipelined.Despite being clean and readable, I don't tend to do it any more, because it's harder to debug. More often these days, I write things like this:
It's ugly, but you know what? I can set a breakpoint anywhere and inspect the intermediate states without having to edit the script in prod. Sometimes ugly and boring is better.The inventor of the shell pipeline, Douglas McIlroy, always understood the equivalency between pipelines and coroutines; it was deliberate. See https://www.cs.dartmouth.edu/~doug/sieve/sieve.pdf It goes even deeper than it appears, too. The way pipes were originally implemented in the Unix kernel was when the pipe buffer was filled[1] by the writer the kernel continued execution directly in the blocked reader process without bouncing through the scheduler. Effectively, arguably literally, coroutines; one process call the write function and execution continues with a read call returning the data.
Interestingly, Solaris Doors operate the same way by design--no bouncing through the scheduler--unlike pipes today where long ago I think most Unix kernels moved away from direct execution switching to better support multiple readers, etc.
[1] Or even on the first write? I'd have to double-check the source again.
Something like
is a hell to read and understand later imo. You have to read a lot of intermediate variables that do not matter in anything else in the code after you set it up, but you do not know in advance necessarily which matter and which don't unless you read and understand all of it. Also, it pollutes your workspace with too much stuff, so while this makes it easier to debug, it makes it also harder to read some time after. Moreover becomes even more crumpy if you need to repeat code. You probably need to define a function block then, which moves the crumpiness there.What I do now is starting defining the transformation in each step as a pure function, and chain them after once everything works, plus enclosing it into an error handler so that I depend on breakpoint debugging less.
There is certainly a trade off, but as a codebase grows larger and deals with more cases where the same code needs to be applied, the benefits of a concise yet expressive notation shows.
Eh, at first glance it looks "amateurish" due to all the repeated stuff. Chaining explicitly eliminates redundant operations - a more minimal representation of data flow - so it looks more "professional". But I also know better than to act on that impulse. ;)
That said, it really depends on the language at play. Some will compile all the repetition of `data =` away such that the variable's memory isn't re-written until after the last operation in that list; it'll hang out in a register or on the stack somewhere. Others will run the code exactly as written, bouncing data between the heap, stack, and registers - inefficiencies and all.
IMO, a comment like "We wind up debugging this a lot, please keep this syntax" would go a long way to help the next engineer. Assuming that the actual processing dwarfs the overhead present in this section, it would be even better to add discrete exception handling and post-conditions to make it more robust.
Bit annoying, but serviceable. Though there's nothing wrong with your approach either.
https://gist.github.com/user-attachments/assets/3329d736-70f...
Allow me, too, to disagree. I think the right term is "function composition".
Instead of writing
as a way to say "first apply f to x, after which g is applied to the result of this, after which h is applied to the result of this", we can use function composition to compose f, g and h, and then "stuff" the value x into this "pipeline of composed functions".We can use whatever syntax we want for that, but I like Elm syntax which would look like:
https://peps.python.org/pep-0289/
You make an interesting point about debugging which is something I have also encountered in practice. There is an interesting tension here which I am unsure about how to best resolve.
In PRQL we use the pipelining approach by using the output of the last step as the implicit last argument of the next step. In M Lang (MS Power BI/Power Query), which is quite similar in many ways, they use second approach in that each step has to be named. This is very useful for debugging as you point out but also a lot more verbose and can be tedious. I like both but prefer the ergonomics of PRQL for interactive work.
Update: Actually, PRQL has a decent answer to this. Say you have a query like:
and you want to figure out why the result set is empty. You can pipe the results into an intermediate reference like so: So, good ergonomics on the happy path and a simple enough workaround when you need it. You can try these out in the PRQL Playground btw: https://prql-lang.org/playground/I believe the correct definition for this concept is the Thrush combinator[0]. In some ML-based languages[1], such as F#, the |> operator is defined[2] for same:
Other functional languages have libraries which also provide this operator, such as the Scala Mouse[3] project.0 - https://leanpub.com/combinators/read#leanpub-auto-the-thrush
1 - https://en.wikipedia.org/wiki/ML_(programming_language)
2 - https://fsharpforfunandprofit.com/posts/defining-functions/
3 - https://github.com/typelevel/mouse?tab=readme-ov-file
Unless I misunderstood the author, because method chaining is super common where I feel thrush operators are pretty rare, I would be surprised if they meant the latter.
I myself agree, and find myself doing that too, especially in frontend code that executes in a browser. Debuggability is much more important than marginally-better readability, for production code.
Processes run in parallel, but they process the data in a strict sequential order: «grep» must produce a chunk of data before «sed» can proceed, and «sed» must produce another chunk of data before «xargs» can do its part. «xargs» in no way can ever pick up the output of «grep» and bypass the «sed» step. If the preceding step is busy crunching the data and is not producing the data, the subsequent step will be blocked (the process will fall asleep). So it is both, a pipeline and a chain.
It is actually a directed data flow graph.
Also, if you replace «haystack.txt» with a /dev/haystack, i.e.
and /dev/haystack is waiting on the device it is attached to to yield a new chunk of data, all of the three, «grep», «sed» and «xargs» will block.Given File.readlines("haystack.txt"), the entire file must be resident in memory before .grep(/needle/) is performed, which may cause unnecessary utilization. Iirc, in frameworks like Polars, the collect() chain ending method tells the compiler that the previous methods will be performed as a stream and thus not require pulling the entirety into memory in order to perform an operation on a subset of the corpus.
I've only ever heard the term 'pipelining' in reference to GPUs, or as an abstract umbrella term for moving data around.
Other than that I think both styles are fine.
Another problem of having different names for each step is that you can no longer quickly comment out a single step to try things out, which you can if you either have the pipeline or a single variable name.
Java streams are the closest equivalent, both by the concurrent execution model, and syntactically. And yes, the Java debugger can show you the state of the intermediate streams.
Iterators are not (necessarily) concurrent. I believe you mean lazily.
the chaining really only works if your language is strongly typed and you are somewhat guaranteed that variables will be of expected type.
But, yes, naive call chaining like that is sometimes a significant performance problem in the real world. For example, in the land of JavaScript. One of the more egregious examples I've personally seen was a Bash script that used Bash arrays rather than pipelines, though in that case it had to do with the loss of concurrency, not data churn.
For my Ruby example, each of those method calls will allocate an Array on the heap, where it will persist until all references are removed and the GC runs again. The extra overhead of the named reference is somewhere between Tiny and Zero, depending on your interpreter. No extra copies are made; it's just a reference.
In most compiled languages: the overhead is exactly zero. At runtime, nothing even knows it's called "data" unless you have debug symbols.
If these are going to be large arrays and you actually care about memory usage, you wouldn't write the code the way I did. You might use lazy enumerators, or just flatten it out into a simple procedure; either of those would process one line at a time, discarding all the intermediate results as it goes.
Also, "File.readlines(i).count" is an atrocity of wasted memory. If you care about efficiency at all, that's the first part to go. :)
This helped me quickly develop a sense for how code is optimized and what code is eventually executed.
Deleted Comment
I do it in a similar way you mentioned
But with actually checked in code, the tradeoff in readability is pretty substantial
However.
I would be lying if I didn't secretly wish that all languages adopted the `|>` syntax from Elixir.
```
params
|> Map.get("user")
|> create_user()
|> notify_admin()
```
https://github.com/tc39/proposal-pipeline-operator
I'm very excited for it.
Our only hope is if TypeScript finally gives up on the broken TC39 process and starts to implement its own syntax enhancements again.
[1] https://2024.stateofjs.com/en-US/usage/#top_currently_missin...
It seems like most people are just asking for the simple function piping everyone expects from the |> syntax, but that doesn't look likely to happen.
They list this as a con of F# (also Elixir) pipes:
The insistence on an arrow function is pure hallucination Should be perfectly achievable as it is in these other languages. What’s more, doing so removes all of the handwringing about await. And I’m frankly at a loss why you would want to put yield in the middle of one of these chains instead of after.``` params.get("user") |> create_user |> notify_admin ```
Even more concise and it doesn't even require a special language feature, it's just regular syntax of the language ( |> is a method like .get(...) so you could even write `params.get("user").|>(create_user) if you wanted to)
In Elixir, it is just a macro so it applies to all functions. I'm only a Scala novice so I'm not sure how it would work there.
This is usually the Thrush combinator[0], exists in other languages as well, and can be informally defined as:
0 - https://leanpub.com/combinators/read#leanpub-auto-the-thrushThe problem is that method-chaining is common in several OO languages, including Ruby. This means the functions on an object return an object, which can then call other functions on itself. In contrast, the pipe operator calls a function, passing in what's on the left side of it as the first argument. In order to work properly, this means you'll need functions that take the data as the first argument and return the same shape to return, whether that's a list, a map, a string or a struct, etc.
When you add a pipe operator to an OO language where method-chaining is common, you'll start getting two different types of APIs and it ends up messier than if you'd just stuck with chaining method calls. I much prefer passing immutable data into a pipeline of functions as Elixir does it, but I'd pick method chaining over a mix of method chaining and pipelines.
Note also that it works well in Elixir because it was created at the same time as most of the standard library. That means that the standard library takes the relevant argument in the first position all the time. Very rarely do you need to pipe into the second argument (and you need a lambda or convenience function to make that work).
Is there any language with a single feature that gives the best of both worlds?
Deleted Comment
```
params
|> Map.get("user")
|> create_user()
|> (¬ify_admin("signup", &1)).() ```
or
```
params
|> Map.get("user")
|> create_user()
|> (fn user -> notify_admin("signup", user) end).() ```
https://github.com/rplevy/swiss-arrowshttps://github.com/hipeta/arrow-macros
[1] https://tour.gleam.run/functions/pipelines/
(No disagreements with your post, just want to give credit where it's due. I'm also a big fan of the syntax)
Instead of:
```
fetch_data()
|> (fn
end).()|> String.upcase()
```
Something like this:
```
fetch_data()
|>? {:ok, val, _meta} -> val
|>? :error -> "default value"
|> String.upcase()
```
This is for sequential conditions. If you have nested conditions, check out a where block instead. https://dev.to/martinthenth/using-elixirs-with-statement-5e3...
Deleted Comment
Deleted Comment
For example, we can write: (foo (bar (baz x))) as (-> x baz bar foo)
If there are additional arguments, we can accommodate those too: (sin (* x pi) as (-> x (* pi) sin)
Where expression so far gets inserted as the first argument to any form. If you want it inserted as the last argument, you can use ->> instead:
(filter positive? (map sin x)) as (->> x (map sin) (filter positive?))
You can also get full control of where to place the previous expression using as->.
Full details at https://clojure.org/guides/threading_macros
I use these with xforms transducers.
https://github.com/johnmn3/injest
That's sort of an argument for the existence of macros as a whole, you can't really do this as neatly in something like python (although I've tried) - I can see the downside of working in a codebase with hundreds of these kind of custom language features though.
One day, we'll (re)discover that partial application is actually incredibly useful for writing programs and (non-Haskell) languages will start with it as the primitive for composing programs instead of finding out that it would be nice later, and bolting on a restricted subset of the feature.
Recently I started using Nushell, which feels very similar.
And for exceptions, why not solve it in the data model, and reify failures? Push it further downstream, let your pipeline's nodes handle "monadic" result values.
Point being, it's always a tradeoff, but you can usually lessen the pain more than you think.
And that's without mentioning that a lot of "pipelining" is pure sugar over the same code we're already writing.
Exception handing is only a problem in languages that use exceptions. Fortunately there are many modern alternatives in wide use that don't use exceptions.
I've encountered and used this pattern in Python, Ruby, Haskell, Rust, C#, and maybe some other languages. It often feels nice to write, but reading can easily become difficult -- especially in Haskell where obscure operators can contain a lot of magic.
Debugging them interactively can be equally problematic, depending on the tooling. I'd argue, it's commonly harder to debug a pipeline than the equivalent imperative code and, that in the best case it's equally hard.
Programming should be focused on the happy path. Much of the syntax in primitive languages concerning exceptions and other early returns is pure noise.
The old adage of not writing code so smart you can’t debug it applies here.
Pipelining runs contrary enough to standard imperative patterns. You don’t just need a new mindset to write code this way. You need to think differently about how you structure your code overall and you need different tools.
That’s not to say that doing things a different way isn’t great, but it does come with baggage that you need to be in a position to carry.
If you need to handle an unhappy path in a way that isn’t optimal for nested function calls then you shouldn’t be nesting your function calls. Pipelining doesn’t magically make things easier nor harder in that regard.
But if a particular sequence of function calls do suit nesting, then pipelining makes the code much more readable because you’re not mixing right-to-left syntax (function nests) with left-to-right syntax (ie you’re typical language syntax).
Deleted Comment
Rust chains everything because of this. It's often unpleasant (see: all the Rust GUI toolkits).
Pipelining can become hard to debug when chains get very long. The author doesn't address how hard it can be to identify which step in a long chain caused an error.
They do make fun of Python, however. But don't say much about why they don't like it other than showing a low-res photo of a rock with a pipe routed around it.
Ambiguity about what constitutes "pipelining" is the real issue here. The definition keeps shifting throughout the article. Is it method chaining? Operator overloading? First-class functions? The author uses examples that function very differently.
Yeah, I agree that this can be problem when you lean heavily into monadic handling (i.e. you have fallible operations and then pipe the error or null all the way through, losing the information of where it came from).
But that doesn't have much to do with the article: You have the same problem with non-pipelined functional code. (And in either case, I think that it's not that big of a problem in practice.)
> The author uses examples that function very differently.
Yeah, this is addressed in one of the later sections. Imo, having a unified word for such a convenience feature (no matter how it's implemented) is better than thinking of these features as completely separate.
It's not like you lose that much readability from
Just before that statement, he says that it is an article/hot take about syntax. He acknowledges your point.
So I think when he says "semantics beat syntax every day of the week", that's him acknowledging that while he prefers certain syntax, it may not be the best for a given situation.
>Let me make it very clear: This is [not an] article it's a hot take about syntax. In practice, semantics beat syntax every day of the week. In other words, don’t take it too seriously.
Building pipelines:
https://effect.website/docs/getting-started/building-pipelin...
Using generators:
https://effect.website/docs/getting-started/using-generators...
Having both options is great (at the beginning effect had only pipe-based pipelines), after years of writing effect I'm convinced that most of the time you'd rather write and read imperative code than pipelines which definitely have their place in code bases.
In fact most of the community, at large, converged at using imperative-style generators over pipelines and having onboarded many devs and having seen many long-time pipeliners converging to classical imperative control flow seems to confirm both debugging and maintenance seem easier.