Readit News logoReadit News
invalidator · 8 months ago
The author keeps calling it "pipelining", but I think the right term is "method chaining".

Compare with a simple pipeline in bash:

  grep needle < haystack.txt | sed 's/foo/bar/g' | xargs wc -l
Each of those components executes in parallel, with the intermediate results streaming between them. You get a similar effect with coroutines.

Compare Ruby:

  data = File.readlines("haystack.txt")
    .map(&:strip)
    .grep(/needle/)
    .map { |i| i.gsub('foo', 'bar') }
    .map { |i| File.readlines(i).count }
In that case, each line is processed sequentially, with a complete array being created between each step. Nothing actually gets pipelined.

Despite being clean and readable, I don't tend to do it any more, because it's harder to debug. More often these days, I write things like this:

  data = File.readlines("haystack.txt")
  data = data.map(&:strip)
  data = data.grep(/needle/)
  data = data.map { |i| i.gsub('foo', 'bar') }
  data = data.map { |i| File.readlines(i).count }
It's ugly, but you know what? I can set a breakpoint anywhere and inspect the intermediate states without having to edit the script in prod. Sometimes ugly and boring is better.

wahern · 8 months ago
> The author keeps calling it "pipelining", but I think the right term is "method chaining". [...] You get a similar effect with coroutines.

The inventor of the shell pipeline, Douglas McIlroy, always understood the equivalency between pipelines and coroutines; it was deliberate. See https://www.cs.dartmouth.edu/~doug/sieve/sieve.pdf It goes even deeper than it appears, too. The way pipes were originally implemented in the Unix kernel was when the pipe buffer was filled[1] by the writer the kernel continued execution directly in the blocked reader process without bouncing through the scheduler. Effectively, arguably literally, coroutines; one process call the write function and execution continues with a read call returning the data.

Interestingly, Solaris Doors operate the same way by design--no bouncing through the scheduler--unlike pipes today where long ago I think most Unix kernels moved away from direct execution switching to better support multiple readers, etc.

[1] Or even on the first write? I'd have to double-check the source again.

marhee · 8 months ago
I don’t find your “seasoned developer” version ugly at all. It just looks more mature and relaxed. It also has the benefits that you can actually do error handling and have space to add comments. Maybe people don’t like it because of the repetition of “data =“ but in fact you could use descriptive new variable names making the code even more readable (auto documenting). I’ve always felt method chaining to look “cramped”, if that’s the right word. Like a person drawing on paper but only using the upper left corner. However, this surely is also a matter of preference or what your used to.
freehorse · 8 months ago
I have a lot of code like this. The reason I prefer pipelines now is the mental overhead of understanding the intermediate step variables.

Something like

  lines = File.readlines("haystack.txt")
  stripped_lines = lines.map(&:strip)
  needle_lines = stripped_lines.grep(/needle/)
  transformed_lines = needle_lines.map { |line| line.gsub('foo', 'bar') }
  line_counts = transformed_lines.map { |file_path| File.readlines(file_path).count }
is a hell to read and understand later imo. You have to read a lot of intermediate variables that do not matter in anything else in the code after you set it up, but you do not know in advance necessarily which matter and which don't unless you read and understand all of it. Also, it pollutes your workspace with too much stuff, so while this makes it easier to debug, it makes it also harder to read some time after. Moreover becomes even more crumpy if you need to repeat code. You probably need to define a function block then, which moves the crumpiness there.

What I do now is starting defining the transformation in each step as a pure function, and chain them after once everything works, plus enclosing it into an error handler so that I depend on breakpoint debugging less.

There is certainly a trade off, but as a codebase grows larger and deals with more cases where the same code needs to be applied, the benefits of a concise yet expressive notation shows.

deredede · 8 months ago
Code in this "named-pipeline" style is already self-documenting: using the same variable name makes it clear that we are dealing with a pipeline/chain. Using more descriptive names for the intermediate steps hides this, making each line more readable (and even then you're likely to end up with `dataStripped = data.map(&:strip)`) at the cost of making the block as a whole less readable.
pragma_x · 8 months ago
> Maybe people don’t like it because of the repetition of “data =“

Eh, at first glance it looks "amateurish" due to all the repeated stuff. Chaining explicitly eliminates redundant operations - a more minimal representation of data flow - so it looks more "professional". But I also know better than to act on that impulse. ;)

That said, it really depends on the language at play. Some will compile all the repetition of `data =` away such that the variable's memory isn't re-written until after the last operation in that list; it'll hang out in a register or on the stack somewhere. Others will run the code exactly as written, bouncing data between the heap, stack, and registers - inefficiencies and all.

IMO, a comment like "We wind up debugging this a lot, please keep this syntax" would go a long way to help the next engineer. Assuming that the actual processing dwarfs the overhead present in this section, it would be even better to add discrete exception handling and post-conditions to make it more robust.

ehnto · 8 months ago
In most debuggers I have used, if you put a breakpoint on the first line of the method chain, you can "step over" each function in the chain until you get to the one you want.

Bit annoying, but serviceable. Though there's nothing wrong with your approach either.

grimgrin · 8 months ago
debuggers can take it even further if they want that UX. in firefox given a chain of foo().bar().baz() you can set a breakpoint on any of 'em.

https://gist.github.com/user-attachments/assets/3329d736-70f...

runeks · 8 months ago
> The author keeps calling it "pipelining", but I think the right term is "method chaining".

Allow me, too, to disagree. I think the right term is "function composition".

Instead of writing

  h(g(f(x)))
as a way to say "first apply f to x, after which g is applied to the result of this, after which h is applied to the result of this", we can use function composition to compose f, g and h, and then "stuff" the value x into this "pipeline of composed functions".

We can use whatever syntax we want for that, but I like Elm syntax which would look like:

  x |> f >> g >> h

billdueber · 8 months ago
If you add in a call to “.lazy“ it won’t create all the intermediate arrays. There since at least 2.7. https://ruby-doc.org/core-2.7.0/Enumerator/Lazy.html
dorfsmay · 8 months ago
I do the same with Python, replacing multilevel comprehensions with intermediary steps of generator expressions, which are lazy and therefore do not impact performance and memory usage.

https://peps.python.org/pep-0289/

zelphirkalt · 8 months ago
Ultimately it will depend on the functions being chained. If they can work with one part of the result, or a subset of parts, then they might not block, otherwise they will still need to get a complete result and the lazy cannot help.
snthpy · 8 months ago
I think the best term is "function composition", but with a particular syntax so pipelining seems alright. Method chaining is a common case, where some base object is repeatedly modified by some action and then the object reference is returned by the "method", thus allowing the "chaining", but what if you're not dealing with objects and methods? The pipelined composition pattern is more general than method chaining imho.

You make an interesting point about debugging which is something I have also encountered in practice. There is an interesting tension here which I am unsure about how to best resolve.

In PRQL we use the pipelining approach by using the output of the last step as the implicit last argument of the next step. In M Lang (MS Power BI/Power Query), which is quite similar in many ways, they use second approach in that each step has to be named. This is very useful for debugging as you point out but also a lot more verbose and can be tedious. I like both but prefer the ergonomics of PRQL for interactive work.

Update: Actually, PRQL has a decent answer to this. Say you have a query like:

    from invoices
    filter total > 1_000
    derive invoice_age = @2025-04-23 - invoice_date
    filter invoice_age > 3months
and you want to figure out why the result set is empty. You can pipe the results into an intermediate reference like so:

    from invoices
    filter total > 1_000
    into tmp
    
    from tmp
    derive invoice_age = @2025-04-23 - invoice_date
    filter invoice_age > 3months
So, good ergonomics on the happy path and a simple enough workaround when you need it. You can try these out in the PRQL Playground btw: https://prql-lang.org/playground/

AdieuToLogic · 8 months ago
> The author keeps calling it "pipelining", but I think the right term is "method chaining".

I believe the correct definition for this concept is the Thrush combinator[0]. In some ML-based languages[1], such as F#, the |> operator is defined[2] for same:

  [1..10] |> List.map (fun i -> i + 1)
Other functional languages have libraries which also provide this operator, such as the Scala Mouse[3] project.

0 - https://leanpub.com/combinators/read#leanpub-auto-the-thrush

1 - https://en.wikipedia.org/wiki/ML_(programming_language)

2 - https://fsharpforfunandprofit.com/posts/defining-functions/

3 - https://github.com/typelevel/mouse?tab=readme-ov-file

ehnto · 8 months ago
I'm not sure that's right, method chaining is just immediately acting on the return of the previous function, directly. It doesn't pass the return into the next function like a pipeline. The method must exist on the returned object. That is different to pipelines or thrush operators. Evaluation happens in the order it is written.

Unless I misunderstood the author, because method chaining is super common where I feel thrush operators are pretty rare, I would be surprised if they meant the latter.

ses1984 · 8 months ago
Shouldn’t modern debuggers be able to handle that easily? You can step in, step out, until you get where you want, or you could set a breakpoint in the method you want to debug instead of at the call site.
abirch · 8 months ago
Even if your debugger can't do that, an AI agent can easily change the code for you to add intermediate output.
refactor_master · 8 months ago
> Despite being clean and readable, I don't tend to do it any more, because it's harder to debug. More often these days, I write things like this:

    data = File.readlines("haystack.txt")
    data = data.map(&:strip)
    data = data.grep(/needle/)
    data = data.map { |i| i.gsub('foo', 'bar') }
    data = data.map { |i| File.readlines(i).count }
Hard disagree. It's less readable, the intend is unclear (where does it end?), and the variables are rewritten on every step and everything is named "data" (and please don't call them data_1, data_2, ...) so now you have to run a debugger to figure out what even is going on, rather than just... reading the code.

veidr · 8 months ago
The person you are quoting already conceded that is less readable, but that the ability to set a breakpoint easily (without having to stop the process and modify the code) is more important.

I myself agree, and find myself doing that too, especially in frontend code that executes in a browser. Debuggability is much more important than marginally-better readability, for production code.

inkyoto · 8 months ago
> Each of those components executes in parallel, with the intermediate results streaming between them. You get a similar effect with coroutines.

Processes run in parallel, but they process the data in a strict sequential order: «grep» must produce a chunk of data before «sed» can proceed, and «sed» must produce another chunk of data before «xargs» can do its part. «xargs» in no way can ever pick up the output of «grep» and bypass the «sed» step. If the preceding step is busy crunching the data and is not producing the data, the subsequent step will be blocked (the process will fall asleep). So it is both, a pipeline and a chain.

It is actually a directed data flow graph.

Also, if you replace «haystack.txt» with a /dev/haystack, i.e.

  grep needle < /dev/haystack | sed 's/foo/bar/g' | xargs wc -l
and /dev/haystack is waiting on the device it is attached to to yield a new chunk of data, all of the three, «grep», «sed» and «xargs» will block.

dzuc · 8 months ago
For debugging method chains you can just use `tap`
adolph · 8 months ago
Isn't the difference between a pipeline and a method chain that a pipeline doesn't have to wait for the previous process to complete in order to send results to the next step? Grep sends lines as it finds them to sed and sed on to xargs, which acts as a sink to collect the data (an is necessary otherwise wc -l would write out a series of ones).

Given File.readlines("haystack.txt"), the entire file must be resident in memory before .grep(/needle/) is performed, which may cause unnecessary utilization. Iirc, in frameworks like Polars, the collect() chain ending method tells the compiler that the previous methods will be performed as a stream and thus not require pulling the entirety into memory in order to perform an operation on a subset of the corpus.

mystified5016 · 8 months ago
Yeah, I've always heard this called method chaining. It's widespread in C#, particularly with Linq (which was explicitly designed to leverage it).

I've only ever heard the term 'pipelining' in reference to GPUs, or as an abstract umbrella term for moving data around.

3np · 8 months ago
I have to object against reusing the 'data' var. Make up a new name for each assignment in particular when types and data structures change (like the last step is switching from strings to ints).

Other than that I think both styles are fine.

hiq · 8 months ago
I agree with this comment: https://news.ycombinator.com/item?id=43759814 that this pollutes current scope, which is especially bad if scoping is not that narrow (the case in Python where if-branches do not define their own scope, I don´t know for Ruby).

Another problem of having different names for each step is that you can no longer quickly comment out a single step to try things out, which you can if you either have the pipeline or a single variable name.

nine_k · 8 months ago
In Python, such steps like map() and filter() would execute concurrently, without large intermediate arrays. It lacks the chaining syntax for them, too.

Java streams are the closest equivalent, both by the concurrent execution model, and syntactically. And yes, the Java debugger can show you the state of the intermediate streams.

maleldil · 8 months ago
> would execute concurrently

Iterators are not (necessarily) concurrent. I believe you mean lazily.

slt2021 · 8 months ago
if you work with I/O, when you can have all sorts of wrong/invalid data and I/O errors, the chaining is a nightmare, as each chain can have numerous different errors/exceptions.

the chaining really only works if your language is strongly typed and you are somewhat guaranteed that variables will be of expected type.

axblount · 8 months ago
Syntactic sugar can sometimes fool us into thinking the underlying process is more efficient or streamlined. As a new programmer, I probably would have assumed that "storing" `data` at each step would be more expensive.
wahern · 8 months ago
It absolutely becomes very inefficient, though the threshold data set size varies according to context. Most languages don't have lightweight coroutines as an alternative (but see Lua!), so the convenient alternatives have larger fixed cost. Plus cache locality means cache utilization might be helpful, or even better, as opposed to switching back-and-for every data element, though coroutine-based approaches can also use buffering strategies, which not coincidentally is how pipes work.

But, yes, naive call chaining like that is sometimes a significant performance problem in the real world. For example, in the land of JavaScript. One of the more egregious examples I've personally seen was a Bash script that used Bash arrays rather than pipelines, though in that case it had to do with the loss of concurrency, not data churn.

invalidator · 8 months ago
It depends on the language you're using.

For my Ruby example, each of those method calls will allocate an Array on the heap, where it will persist until all references are removed and the GC runs again. The extra overhead of the named reference is somewhere between Tiny and Zero, depending on your interpreter. No extra copies are made; it's just a reference.

In most compiled languages: the overhead is exactly zero. At runtime, nothing even knows it's called "data" unless you have debug symbols.

If these are going to be large arrays and you actually care about memory usage, you wouldn't write the code the way I did. You might use lazy enumerators, or just flatten it out into a simple procedure; either of those would process one line at a time, discarding all the intermediate results as it goes.

Also, "File.readlines(i).count" is an atrocity of wasted memory. If you care about efficiency at all, that's the first part to go. :)

bjoli · 8 months ago
Reading this, I am so happy that my first language was a scheme where I could see the result of the first optimization passes.

This helped me quickly develop a sense for how code is optimized and what code is eventually executed.

Deleted Comment

raverbashing · 8 months ago
Exactly that. It looks nice but it's annoying to debug

I do it in a similar way you mentioned

jjfoooo4 · 8 months ago
I think updating the former to the latter when you are actually debugging something isn’t that big of a deal.

But with actually checked in code, the tradeoff in readability is pretty substantial

bnchrch · 8 months ago
I'm personally someone who advocates for languages to keep their feature set small and shoot to achieve a finished feature set quickly.

However.

I would be lying if I didn't secretly wish that all languages adopted the `|>` syntax from Elixir.

```

params

|> Map.get("user")

|> create_user()

|> notify_admin()

```

Cyykratahk · 8 months ago
We might be able to cross one more language off your wishlist soon, Javascript is on the way to getting a pipeline operator, the proposal is currently at Stage 2

https://github.com/tc39/proposal-pipeline-operator

I'm very excited for it.

chilmers · 8 months ago
It also has barely seen any activity in years. It is going nowhere. The TC39 committee is utterly dysfunctional and anti-progress, and will not let any this or any other new syntax into JavaScript. Records and tuples has just been killed, despite being cited in surveys as a major missing feature[1]. Pattern matching is stuck in stage 1 and hasn't been presented since 2022. Ditto for type annotations and a million other things.

Our only hope is if TypeScript finally gives up on the broken TC39 process and starts to implement its own syntax enhancements again.

[1] https://2024.stateofjs.com/en-US/usage/#top_currently_missin...

TehShrike · 8 months ago
I was excited for that proposal, but it veered off course some years ago – some TC39 members have stuck to the position that without member property support or async/await support, they will not let the feature move forward.

It seems like most people are just asking for the simple function piping everyone expects from the |> syntax, but that doesn't look likely to happen.

zdragnar · 8 months ago
I worry about "soon" here. I've been excited for this proposal for years now (8 maybe? I forget), and I'm not sure it'll ever actually get traction at this point.
gregabbott · 8 months ago
A while ago, I wondered how close you could get to a pipeline operator using existing JavaScript features. In case anyone might like to have a look, I wrote a proof-of-concept function called "Chute" [1]. It chains function and method calls in a dot-notation style like the basic example below.

  chute(7)        // setup a chute and give it a seed value
  .toString       // call methods of the current data (parens optional)
  .parseInt       // send the current data through global native Fns
  .do(x=>[x])     // through a chain of one or more local / inline Fns
  .JSON.stringify // through nested global functions (native / custom)
  .JSON.parse
  .do(x=>x[0])
  .log            // through built in Chute methods
  .add_one        // global custom Fns (e.g. const add_one=x=>x+1)
  ()              // end a chute with '()' and get the result
[1] https://chute.pages.dev/ | https://github.com/gregabbott/chute

rossriley · 8 months ago
PHP RFC for version 8.5 too: https://wiki.php.net/rfc/pipe-operator-v3
hinkley · 8 months ago
All of their examples are wordier than just function chaining and I worry they’ve lost the plot somewhere.

They list this as a con of F# (also Elixir) pipes:

    value |> x=> x.foo()
The insistence on an arrow function is pure hallucination

    value |> x.foo()
Should be perfectly achievable as it is in these other languages. What’s more, doing so removes all of the handwringing about await. And I’m frankly at a loss why you would want to put yield in the middle of one of these chains instead of after.

hoppp · 8 months ago
Cool I love it, but another thing we will need polyfills for...
valenterry · 8 months ago
I prefer Scala. You can write

``` params.get("user") |> create_user |> notify_admin ```

Even more concise and it doesn't even require a special language feature, it's just regular syntax of the language ( |> is a method like .get(...) so you could even write `params.get("user").|>(create_user) if you wanted to)

elbasti · 8 months ago
In elixir, ```Map.get("user") |> create_user |> notify_admin ``` would aso be valid, standard elixir, just not idiomatic (parens are optional, but preferred in most cases, and one-line pipes are also frowned upon except for scripting).
agent281 · 8 months ago
Isn't it being a method call not quite equivalent? Are you able to define the method over arbitrary data types?

In Elixir, it is just a macro so it applies to all functions. I'm only a Scala novice so I'm not sure how it would work there.

AdieuToLogic · 8 months ago
> I would be lying if I didn't secretly wish that all languages adopted the `|>` syntax from Elixir.

This is usually the Thrush combinator[0], exists in other languages as well, and can be informally defined as:

  f(g(x)) = g(x) |> f
0 - https://leanpub.com/combinators/read#leanpub-auto-the-thrush

Munksgaard · 8 months ago
Not quite. Note that the Elixir pipe puts the left hand of the pipe as the first argument in the right-hand function. E.g.

    x |> f(y) = f(x, y)
As a result, the Elixir variant cannot be defined as a well-typed function, but must be a macro.

AlchemistCamp · 8 months ago
I've been using Elxir for a long time and had that same hope after having experienced how clear, concise and maintainable apps can be when the core is all a bunch of pipelines (and the boundary does error handling using cases and withs). But having seen the pipe operator in Ruby, I now think it was a bad idea.

The problem is that method-chaining is common in several OO languages, including Ruby. This means the functions on an object return an object, which can then call other functions on itself. In contrast, the pipe operator calls a function, passing in what's on the left side of it as the first argument. In order to work properly, this means you'll need functions that take the data as the first argument and return the same shape to return, whether that's a list, a map, a string or a struct, etc.

When you add a pipe operator to an OO language where method-chaining is common, you'll start getting two different types of APIs and it ends up messier than if you'd just stuck with chaining method calls. I much prefer passing immutable data into a pipeline of functions as Elixir does it, but I'd pick method chaining over a mix of method chaining and pipelines.

rkangel · 8 months ago
I'm a big fan of the Elixir operator, and it should be standard in all functional programming languages. You need it because everything is just a function and you can't do anything like method chaining, because none of the return values have anything like methods. The |> is "just" syntax sugar for a load of nested functions. Whereas the Rust style method chaining doesn't need language support - it's more of a programming style.

Note also that it works well in Elixir because it was created at the same time as most of the standard library. That means that the standard library takes the relevant argument in the first position all the time. Very rarely do you need to pipe into the second argument (and you need a lambda or convenience function to make that work).

matthewsinclair · 8 months ago
Agree. This is absolutely my fave part of Elixir. Whenever I can get something to flow elegantly thru a pipeline like that, I feel like it’s a win against chaos.
mvieira38 · 8 months ago
R has a lovely toolkit for data science using this syntax, called the tidyverse. My favorite dev experience, it's so easy to just write code
jasperry · 8 months ago
Yes, a small feature set is important, and adding the functional-style pipe to languages that already have chaining with the dot seems to clutter up the design space. However, dot-chaining has the severe limitation that you can only pass to the first or "this" argument.

Is there any language with a single feature that gives the best of both worlds?

Deleted Comment

bnchrch · 8 months ago
FWIW you can pass to other arguments than first in this syntax

```

params

|> Map.get("user")

|> create_user()

|> (&notify_admin("signup", &1)).() ```

or

```

params

|> Map.get("user")

|> create_user()

|> (fn user -> notify_admin("signup", user) end).() ```

AndyKluger · 8 months ago
Do concatenative langs like Factor fit the bill?
hinkley · 8 months ago
The pipe operator relies on the first argument being the subject of the operation. A lot of languages have the arguments in a different order, and OO languages sometimes use function chaining to get a similar result.
Terr_ · 8 months ago
IIRC the usual workaround in Elixir involves be small lambda that rearranges things:

    "World"
    |> then(&concat("Hello ", &1))

I imagine a shorter syntax could someday be possible, where some special placeholder expression could be used, ex:

    "World"
    |> concat("Hello ", &1)
However that creates a new problem: If the implicit-first-argument form is still permitted (foo() instead of foo(&1)) then it becomes confusing which function-arity is being called. A human could easily fail to notice the absence or presence of the special placeholder on some lines, and invoke the wrong thing.

sparkie · 8 months ago
You could make use of `flip` from Haskell.

    flip :: (x -> y -> z) -> (y -> x -> x)
    flip f = \y -> \x -> f x y

    x |> (flip f)(y)    -- f(x, y)

Alupis · 8 months ago
Pipelines are one of the greatest Gleam features[1].

[1] https://tour.gleam.run/functions/pipelines/

dorian-graph · 8 months ago
I wouldn't say it's a Gleam feature per se, in that it's not something that it's added that isn't already in Elixir.
bradford · 8 months ago
I hate to be that guy, but I believe the `|>` syntax started with F# before Elixir picked it up.

(No disagreements with your post, just want to give credit where it's due. I'm also a big fan of the syntax)

ghthor · 8 months ago
I turn older then f#, it’s been an ML language thing for a while but not sure where it first appeared
Symmetry · 8 months ago
I feel like Haskell really missed a trick by having $ not go the other way, though it's trivial to make your own symbol that goes the other way.
jose_zap · 8 months ago
Haskell has & which goes the other way:

    users
      & map validate
      & catMaybes
      & mapM persist

manmal · 8 months ago
I wish there were a variation that can destructure more ergonomically.

Instead of:

```

fetch_data()

|> (fn

  {:ok, val, _meta} -> val

  :error -> "default value"
end).()

|> String.upcase()

```

Something like this:

```

fetch_data()

|>? {:ok, val, _meta} -> val

|>? :error -> "default value"

|> String.upcase()

```

smallerize · 8 months ago

  fetch_data()
  |> case do
      {:ok, val, _meta} -> val
      :error -> "default value"
  end
You have the extra "case do...end" block but it's pretty close?

This is for sequential conditions. If you have nested conditions, check out a where block instead. https://dev.to/martinthenth/using-elixirs-with-statement-5e3...

Deleted Comment

layer8 · 8 months ago
It would be even better without the `>`, though. The `|>` is a bit awkward to type, and more noisy visually.
MyOutfitIsVague · 8 months ago
I disagree, because then it can be very ambiguous with an existing `|` operator. The language has to be able to tell that this is a pipeline and not doing a bitwise or operation on the output of multiple functions.

Deleted Comment

neonsunset · 8 months ago
Elixir itself adopted this operator from F#
Straw · 8 months ago
Lisp macros allow a general solution to this that doesn't just handle chained collection operators but allows you to decide the order in which you write any chain of calls.

For example, we can write: (foo (bar (baz x))) as (-> x baz bar foo)

If there are additional arguments, we can accommodate those too: (sin (* x pi) as (-> x (* pi) sin)

Where expression so far gets inserted as the first argument to any form. If you want it inserted as the last argument, you can use ->> instead:

(filter positive? (map sin x)) as (->> x (map sin) (filter positive?))

You can also get full control of where to place the previous expression using as->.

Full details at https://clojure.org/guides/threading_macros

gleenn · 8 months ago
I find the threading operators in Clojure bring much joy and increase readability. I think it's interesting because it makes me actually consider function argument order much more because I want to increase opportunities to use them.
aeonik · 8 months ago
These threading macros can increase performance, the developer even has a parallelizing threading macro.

I use these with xforms transducers.

https://github.com/johnmn3/injest

benrutter · 8 months ago
Yeah, I found this when I was playing around with Hy a while back. I wanted a generic `->` style operator, and isn't wasn't too much trouble to write a macro to introduce one.

That's sort of an argument for the existence of macros as a whole, you can't really do this as neatly in something like python (although I've tried) - I can see the downside of working in a codebase with hundreds of these kind of custom language features though.

sooheon · 8 months ago
Yes threading macros are so much nicer than method chaining, because it allows general function reuse, rather than being limited to the methods that happen to be defined in your initial data object.
duped · 8 months ago
A pipeline operator is just partial application with less power. You should be able to bind any number of arguments to any places in order to create a new function and "pipe" its output(s) to any other number of functions.

One day, we'll (re)discover that partial application is actually incredibly useful for writing programs and (non-Haskell) languages will start with it as the primitive for composing programs instead of finding out that it would be nice later, and bolting on a restricted subset of the feature.

zelphirkalt · 8 months ago
I like partial application like in Standard ML, but it also means, that one must be very careful with the order of arguments, unless we get a variant of partial application, that is flexible enough to let you specify which arguments you want to provide, instead of always assuming the first n arguments. I use "cut" for this in Scheme. Threading/Pipelines are still very useful though and can shorten things and make them very readable.
dayvigo · 8 months ago
Sure. But how do you write that in a way that is expressive, terse, and readable all at once? Nothing beats x | y | z or (-> x y z). The speed of both writing and reading (and comprehending), the sheer simplicity, is what makes pipelining useful in the first place.
gpderetta · 8 months ago
for loops are also gotos with less power, yet we usually prefer them.
choult · 8 months ago
... and then recreate the scripting language...
stogot · 8 months ago
I was just thinking does this not sound like a shell language? Using | instead of .function()
SimonDorfman · 8 months ago
The tidyverse folks in R have been using that for a while: https://magrittr.tidyverse.org/reference/pipe.html
thom · 8 months ago
I've always found magrittr mildly hilarious. R has vestigial Lisp DNA, but somehow the R implementation of pipes was incredibly long, complex and produced stack traces, so it moved to a native C implementation, which nevertheless has to manipulate the SEXPs that secretly underlie the language. Compared to something like Clojure's threading macros it's wild how much work is needed.
madcaptenor · 8 months ago
And base R has had a pipe for a couple years now, although there are some differences between base R's |> and tidyverse's %>%: https://www.tidyverse.org/blog/2023/04/base-vs-magrittr-pipe...
steine65 · 8 months ago
R, specifically tidyverse, has a special place in my heart. Tidy principles makes data analysis easy to read and easy to use new functions, since there are standards that must be met to call a function "tidy."

Recently I started using Nushell, which feels very similar.

flobosg · 8 months ago
Base R as well: |> was implemented as a pipe operator in 4.1.0.
tylermw · 8 months ago
Importantly, the base R pipe implements the operation at the language parsing level, so it has basically zero overhead.
mvieira38 · 8 months ago
R + tidyverse is the gold standard for working with data quickly in a readable and maintainable way, IMO. It's just absolutely seamless. Shoutout to tidyverts (https://tidyverts.org/) for working with time series, too
amai · 8 months ago
Pipelining looks nice until you have to debug it. And exception handling is also very difficult, because that means to add forks into your pipelines. Pipelines are only good for programming the happy path.
mpalmer · 8 months ago
At the risk of over generalized pronouncements, ease of debugging is usually down to how well-designed your tooling happens to be. Most of the time the framework/language does that for you, but it's not the only option.

And for exceptions, why not solve it in the data model, and reify failures? Push it further downstream, let your pipeline's nodes handle "monadic" result values.

Point being, it's always a tradeoff, but you can usually lessen the pain more than you think.

And that's without mentioning that a lot of "pipelining" is pure sugar over the same code we're already writing.

eikenberry · 8 months ago
Pipelining simplifies debugging. Each step is obvious and it is trivial to insert logging between pipeline elements. It is easier to debug than the patterns compared in the article.

Exception handing is only a problem in languages that use exceptions. Fortunately there are many modern alternatives in wide use that don't use exceptions.

switchbak · 8 months ago
This is my experience too - when the errors are encoded into the type system, this becomes easier to reason about (which is much of the work when you’re debugging).
w4rh4wk5 · 8 months ago
Yes, certainly!

I've encountered and used this pattern in Python, Ruby, Haskell, Rust, C#, and maybe some other languages. It often feels nice to write, but reading can easily become difficult -- especially in Haskell where obscure operators can contain a lot of magic.

Debugging them interactively can be equally problematic, depending on the tooling. I'd argue, it's commonly harder to debug a pipeline than the equivalent imperative code and, that in the best case it's equally hard.

jim-jim-jim · 8 months ago
I don't know what you're writing, but this sounds like language smell. If you can represent errors as data instead of exceptions (Either, Result, etc) then it is easy to see what went wrong, and offer fallback states in response to errors.

Programming should be focused on the happy path. Much of the syntax in primitive languages concerning exceptions and other early returns is pure noise.

rusk · 8 months ago
Established debugging tools and logging rubric are not suitable for debugging heavily pipelined code. Stack traces, debuggers rely heavily on line based references which are less useful in this style and can make diagnostic practices feel a little clumsy.

The old adage of not writing code so smart you can’t debug it applies here.

Pipelining runs contrary enough to standard imperative patterns. You don’t just need a new mindset to write code this way. You need to think differently about how you structure your code overall and you need different tools.

That’s not to say that doing things a different way isn’t great, but it does come with baggage that you need to be in a position to carry.

hnlmorg · 8 months ago
Pipelining is just syntactic sugar for nested function calls.

If you need to handle an unhappy path in a way that isn’t optimal for nested function calls then you shouldn’t be nesting your function calls. Pipelining doesn’t magically make things easier nor harder in that regard.

But if a particular sequence of function calls do suit nesting, then pipelining makes the code much more readable because you’re not mixing right-to-left syntax (function nests) with left-to-right syntax (ie you’re typical language syntax).

EVa5I7bHFq9mnYK · 8 months ago
I think they are talking about nested loops, not nested function calls.

Deleted Comment

bsder · 8 months ago
Pipelining is also nice until you have to use it for everything because you can't do alternatives (like default function arguments) properly.

Rust chains everything because of this. It's often unpleasant (see: all the Rust GUI toolkits).

bergen · 8 months ago
Depends on the context - in a scripting language where you have some kind of console you just don't copy all lines, and see what each pipe does one after another. This is pretty straight forward. (Not talking about compiled code though)
kordlessagain · 8 months ago
While the author claims "semantics beat syntax every day of the week," the entire article focuses on syntax preferences rather than semantic differences.

Pipelining can become hard to debug when chains get very long. The author doesn't address how hard it can be to identify which step in a long chain caused an error.

They do make fun of Python, however. But don't say much about why they don't like it other than showing a low-res photo of a rock with a pipe routed around it.

Ambiguity about what constitutes "pipelining" is the real issue here. The definition keeps shifting throughout the article. Is it method chaining? Operator overloading? First-class functions? The author uses examples that function very differently.

Mond_ · 8 months ago
> Pipelining can become hard to debug when chains get very long. The author doesn't address how hard it can be to identify which step in a long chain caused an error.

Yeah, I agree that this can be problem when you lean heavily into monadic handling (i.e. you have fallible operations and then pipe the error or null all the way through, losing the information of where it came from).

But that doesn't have much to do with the article: You have the same problem with non-pipelined functional code. (And in either case, I think that it's not that big of a problem in practice.)

> The author uses examples that function very differently.

Yeah, this is addressed in one of the later sections. Imo, having a unified word for such a convenience feature (no matter how it's implemented) is better than thinking of these features as completely separate.

zelphirkalt · 8 months ago
You can add peek steps in pipelines and inspect the in between results. Not really any different from normal function call debugging imo.
krapht · 8 months ago
Yes, but here's my hot take - what if you didn't have to edit the source code to debug it? Instead of chaining method calls you just assign to a temporary variable. Then you can set breakpoints and inspect variable values like you do normally without editing source.

It's not like you lose that much readability from

  foo(bar(baz(c)))

  c |> baz |> bar |> foo

  c.baz().bar().foo()

  t = c.baz()
  t = t.bar()
  t = t.foo()

bena · 8 months ago
I think you may have misinterpreted his motive here.

Just before that statement, he says that it is an article/hot take about syntax. He acknowledges your point.

So I think when he says "semantics beat syntax every day of the week", that's him acknowledging that while he prefers certain syntax, it may not be the best for a given situation.

fsckboy · 8 months ago
the paragraph you quoted (atm, 7 mins ago, did it change?) says:

>Let me make it very clear: This is [not an] article it's a hot take about syntax. In practice, semantics beat syntax every day of the week. In other words, don’t take it too seriously.

AYBABTME · 8 months ago
It's just as difficult to debug when function calls are nested inline instead of assigning to variables and passing the variables around.
steine65 · 8 months ago
Agreed that long chains are hard to debug. I like to keep chains around the size of a short paragraph.
pavel_lishin · 8 months ago
The article also clearly points that that it's just a hot-take, and to not take it too seriously.
epolanski · 8 months ago
I personally like how effect-ts allows you to write both pipelines or imperative code to express the very same things.

Building pipelines:

https://effect.website/docs/getting-started/building-pipelin...

Using generators:

https://effect.website/docs/getting-started/using-generators...

Having both options is great (at the beginning effect had only pipe-based pipelines), after years of writing effect I'm convinced that most of the time you'd rather write and read imperative code than pipelines which definitely have their place in code bases.

In fact most of the community, at large, converged at using imperative-style generators over pipelines and having onboarded many devs and having seen many long-time pipeliners converging to classical imperative control flow seems to confirm both debugging and maintenance seem easier.