Array Languages: R vs. APL (2023)

One of the wildest R features I know of comes as a result of lazy argument evaluation combined with the ability to programmatically modify the set of variable bindings. This means that functions can define local variables that are usable by their arguments (i.e. `f(x+1)` can use a value of `x` that is provided from within `f` when evaluating `x+1`). This is used extensively in practice in the dplyr, ggplot, and other tidyverse libraries.

I think software engineers often get turned off by the weird idiosyncrasies of R, but there are surprisingly unique (arguably helpful) language features most people don't notice. Possibly because most of the learning material is data-science focused and so it doesn't emphasize the bonkers language features that R has.

empyrrhicist · a year ago

I saw a funny presentation where Doug Bates said something like: "This kind of evaluation opens the door to do many strange and unspeakable things in R... for some reason Hadley Wickham is very excited about this."

crest · a year ago

Unspeakable horrors like changing `$[` in old Perl5 versions to mess with someone's mind? Who doesn't like array indices starting at 0, 1, ... or 42?

kkoncevicius · a year ago

One of the stranger behaviours for me is that R allows you to combine infix operators with assignments, even thou there are no implemented instances of it in R itself. For example:

  `%in%<-` <- function(x, y, value) { x[x %in% y] <- value; x}

  x <- c("a", "b", "c", "d")
  x %in% c("a", "c") <- "o"
  x
  [1] "o" "b" "o" "d"

Or slightly crazier:

  `<-<-` <- function(x, y, value) paste0(y, "_", value)

  "a" -> x <- "b"
  x
  [1] "a_b"

We with Antoine Fabri created a package that uses this behaviour for some clever replacement operators [1], but beyond that I don't see where this could be useful in real practice.

[1]: https://github.com/moodymudskipper/inops

Avshalom · a year ago

For those who haven't run into anything about this corner of R before:

https://blog.moertel.com/posts/2006-01-20-wondrous-oddities-...

broomcorn · a year ago

That sounds like asking for trouble. Someone coming from any other programming language could easily forget that expression evaluation is stateful. Better to be explicit and create an object representing a expression. Tell me, at least, that the variable is immutable in that context?

bnprks · a year ago

The good news is that most variables in R are immutable with copy-on-write semantics. Therefore, most of the time everything here will be side-effect-free and any weird editing of the variable bindings is confined to within the function. (The cases that would have side effects are very uncommonly used in my experience)

nerdponx · a year ago

The whole magic is that expressions are in fact just objects in the language. And no, there aren't any immutable bindings here.

VTimofeenko · a year ago

Asking out of lack of experience with R: how does such invocation handle case when `x` is defined with a different value at call site?

In pseudocode:

  f =
  let x = 1 in # inner vars for f go here
  arg -> arg + 1 # function logic goes here

  # example one: no external value
  f (x+1) # produces 3 (arg := (x+1) = 2; return arg +1)

  # example two: x is defined in the outer scope
  let x = 4 in
  f (x+2) # produces 5 (arg := 4; return arg + 1)? Or 3 if inner x wins as in example one?

bnprks · a year ago

If the function chooses to overwrite the value of a variable binding, it doesn't matter how it is defined at the call site (so inner x wins in your example). In the tidyverse libraries, they often populate a lazy list variable (think python dictionary) that allows disambiguating in the case of name conflicts between the call site and programmatic bindings. But that's fully a library convention and not solved by the language.

hugh-avherald · a year ago

Well the point is that the function can define its own logic to determine the behaviour. Users can also (with some limits) restrict the variable scope.

Deleted Comment

csimon80 · a year ago

A lot of the time you're not actually using what is passed to the function, but instead the name of the argument passed to the function (f(x), instead of f('x')). Which, helps the user with their query (dplyr) or configuration (ggplot2).

delusional · a year ago

> I think software engineers often get turned off by the weird idiosyncrasies of R

That was at least true when I was looking at it. I didn't get it, but the data guys came away loving it. I came away from that whole experience really appreciating how far you can get with an "unclean" design if you persist, and how my gut feeling of good (with all the heuristics for quality that entails) is really very domain specific.

HdS84 · a year ago

I once needed to implement an API in R, just saying that having three or four object oriented systems did not help at all.

staplung · a year ago

I had a colleague at Google who used to say: "The best thing about R is that is was created by statisticians. The worst thing about R was that it was created by statisticians."

Where / is for reduce, +. is for the GCD, the LCM is *. The basic idea of J notation is using some small change to mean the contrary, for example {. for first and {: for last, {. for take and }. for drop (one symbol can be used as a unary or binary operator with different meaning. So if floor is <. you can guess what will be the symbol for roof. For another example /:~ is for sorting in ascending order and I imagine that you can guess what is the symbol for sorting in descending order. In a sense, J notation include some semantic meaning, a LLM could use that notation to try to change an algorithm. So perhaps someone could think about how to expand this idea for LLM to generate new algorithms.

m ; (+/ m) ; >./ +/ m ┌─────┬───────┬──┐ │0 1 2│9 12 15│15│ │3 4 5│ │ │ │6 7 8│ │ │ └─────┴───────┴──┘

> "So, would APL be “readable” if I was more familiar with it? Let’s find out!"

An alternative test for this hypothesis might have been using the language J, which is an array language based on APL and by the designer of APL but only using ASCII characters.

nonfamous · a year ago

R itself could be considered a test of this hypothesis, too. It’s been said that elegant, powerful Lisp would be more widely adopted if it wasn’t for all those gosh-darned parenthesis.

Well, at its core R is a Lisp (specifically, Scheme) but with a more traditional syntax (infixed operators, function calls, etc). And it’s fair to say the adoption of R has, indeed, been more widespread than that of Lisp.

dan-robertson · a year ago

I’m not totally convinced that being ‘secretly a lisp’ is what was good about R. I think the easy vectorisation is good, and the consequences of the bizarre function argument evaluation are good. I don’t know of lisps that do the vectorisation stuff so naturally, and while I guess fexprs are a thing, I think they are possibly too general in the syntax they can accept – basically the simplicity of lisp syntax allows macros to have more tree-structured input in a way you wouldn’t want for a language with non-lisp syntax (where the head lives outside the list), and I think the flexibility makes the syntax more confusingly non-uniform.

mik1998 · a year ago

I'm not sure I would come to this conclusion. R has some adoption, but it's also really not used as a generic programming language, which most Lisp dialects are.

seanhunter · a year ago

As someone who loved learning lisp and regrets that the long course of my programming career has never led me to use it in a professional capacity, I just don't buy it when people say that parentheses are the reason people didn't adopt lisp more widely. I would say the main reasons are:

1) The language is so frikkin massive. Common lisp is a huge language with hundreds and hundreds of built-in functions etc and the standard came very late in its evolution so there is a bunch of back compat cruft and junk that everyone has to live with. The object system is a whole epic journey in itself. You could probably kill or at least seriously injure someone with the impact if they were lying down and you dropped a copy of Guy Steele's excellent book[1] on them from a standing height.

2) The ecosystem is so fragmented. First you have Common Lisp, which isn't very common at all. Then you have all the vendor lisps. Then you have whether they have or don't have clos to contend with. Elisp is a lisp but is not common lisp and differs in some important ways that I don't quite remember. Then there's scheme, and guile scheme (which isn't quite the same) then clojure, etc etc.

3) That meant that the tooling was basically all simultaneously amazing and awful. As an example my uncle wrote a tcp/ip stack in lisp for the symbolics lisp machine[2] for a project when he worked at xerox. He told me in the late 80s about features in the symbolics debugger that just totally blew my mind and are only now available in IDEs for other languages, like being able to step backwards, alter variables, then step forward again, jump to any stack frame and just resume execution from there etc etc. On the other hand he had to write the TCP/IP stack himself because they didn't have one. I think that perfectly encapsulates the lisp experience for me around 2000 when I last used it - some things worked amazingly and were way better than anything else (eg I remember at the time the things you could do with serialization being just extraordinary compared to other languages) but a bunch of basic stuff was painful, janky or just completely missing.

4) Some of the concepts are very powerful but result in programs that are incredibly hard to understand. Macros, continuation passing, multiple dispatch.. etc etc. This puts a lot of people off because they just hit the learning cliff face-first and give up.

This is part of why python saw such wide adoption in my opinion. Not because it was in any sense the best language, but it was a very easy, practical choice for doing a bunch of things.

[1] https://www.cs.cmu.edu/Groups/AI/html/cltl/cltl2.html . Paul Graham (yes that Paul Graham) wrote a good lisp book also, although for me Steele is the one.

[2] https://en.wikipedia.org/wiki/Symbolics

anthk · a year ago

J it's standalone, it doesn't use APL in the background.

pavon · a year ago

J primitives are easier to type, but they aren't any more readable or familiar to newcomers than APL symbols.

anthk · a year ago

Well at least you can define new tokens with ease.

def d(x): N = arange(1, x + 1); return N[x % N == 0] def m(x, n): return x * arange(1, n + 1) def gcd(x, y): return max(set(d(x)) & set(d(y))) def lcm(x, y): return min(set(m(x, y)) & set(m(y, x))) def test(x, y): return gcd(x, y) * lcm(x, y) == x * y all([test(x, y) for (x, y) in randint(1, 100, (1000, 2))]) # True

: find-gcd ( nums -- gcd ) [ infimum ] [ supremum ] bi gcd nip ; : max-wealth ( accounts -- n ) [ sum ] map-supremum ; : which-max-wealth ( accounts -- i ) [ sum ] supremum-by* drop ; primes-upto

jonocarroll · a year ago

(author here, still getting over the first time I've seen one of my own posts on this site)

The many recommendations for J here are a great nudge for me to give it a proper go. I've taken quite a liking to the traditional APL glyphs ( see a photo of the stickers on my laptop keys in this post https://jcarroll.com.au/2023/12/10/advent-of-array-elegance/ ) so I'm not looking for a way to avoid them.

Another detraction I've seen around is about the ambivalence of APL glyphs (taking either 1 or 2 arguments and doing something different in each case). I don't particularly mind it because I think it becomes more natural to "understand" how a function is being used the more familiar you become with it, but without the limitation on the number of glyphs, I can see the benefit of separating those.

Can’t the second argmax example be written with a right tack? Is it nicer then?

  (⊢⍳⌈/)+/x

Yep, that makes for a nicer tacit solution

  maxrow←(⊢⍳⌈/)+/

but I find

  ⊃(⍒+/)

to be an even cleaner tacit solution.

AndyKluger · a year ago

It's not glyph-ish those APL style languages you like, but have you given Factor a good go?

I have not. I've done at least one problem (some many more) in each of 32 languages on Exercism so far, though. Looking at your example, there's some familiar features from the lisp and ML families.

bruturis · a year ago

>> find the GCD (greatest common divisor) of the smallest and largest numbers in an array

Just for a short comparison, In J the analogous code is </ +. >/

The matrix m, the sum of the rows, and the maximum of the sum of the rows in J (separated by ;)

KarlKode · a year ago

I think you mistyped J code. I don't know any J but what I understood from your comment that it should be something like

  </ +. >/ *.

You are right, the correct code is .</ +. >./

To understand this you need to know that >. and <. are the min and max functions, and that in J three functions separated by spaces, f g h, constitutes a new function mathematically defined by (f g h)(x) = g(f(x), h(x)). An example is (+/ % #) which applied to a list gives the mean of the list. Here +/ gives the total, # gives the number of elements and % is the quotient.

kqr · a year ago

> So if floor is <. you can guess what will be the symbol for roof.

Based on the examples, no, I cannot. It could be either of <: and >.

You are right, both are good options, the author of J chose >. for ceiling and >: for greater than or equal.

weinzierl · a year ago

olliej · a year ago

I personally think APL is wonderful simply because of the original APL specific keyboard [1]

I've looked briefly at R and found the syntax and semantics to be less than stellar. Obviously there's going to be some bias in that sentiment due me not generally doing "array programming", but I don't believe the things that irked me were entirely as a result of that.

The more annoying stuff for R is entirely second hand. As far as I can tell R (or at least R studio) maintains implicit state between runs which means you can get to a position where the same code works on some runs, and then not on later runs. My friend was having to do a lot of bioinformatics processing (many of the libraries for this are in R) and was constantly fighting to have code she wrote to process the data or produce charts (publications in bioinformatics have an acceptance bias for "looks like it came from R" that is similar to what CS [used to?] have for gnu plot). But you could run the same scripts on the same input and have it fail where previously it worked. This is before you deal with inter-version compatibility problems which also seemed frequent.

What was irksome to me looking at a lot of the stuff that were doing is that it was fundamentally mostly basic scripting stuff you could do in other languages trivially (and more cleanly imo) but there were a bunch of functions (builtin or from libraries?) that did the work, but those functions weren't in R, so the claims that R was "necessary" seemed fairly bogus to me.

[1] https://en.wikipedia.org/wiki/APL_(programming_language)#/me...

goosedragons · a year ago

You can save your workspace (state) in R. It's generally bad practice to do so.

R is VERY VERY good at handling tabular data. Python can get kind of close with Pandas but IMO, it's still more awkward than base R data frames and way worse than data.table.

R also has a lot of built-ins geared for statistics and built by statisticians. If you're do it statistics there's value in not having to find a library or libraries that do that.

crispyambulance · a year ago

  > [R/RStudio] maintains implicit state between runs...

That can be turned off and is, in fact, widely recommended to not keep one's workspace between runs.

  > This is before you deal with inter-version compatibility problems which also seemed frequent.

Yeah, that can be a problem with libraries (as it is with python dependencies). It really afflicts long-running projects. R has taken a cue from the python world there. renv the best way (IMHO) to maintain a reproduceable environment in R (https://rstudio.github.io/renv/articles/renv.html).

R is nicely cogent in syntax and largely "just works" once you accept its idiosyncrasies.

rzmmm · a year ago

R has a lot high quality packages which implement e.g. frequently used sophisticated regression analysis algorithms. Python has these too but in my experience they are not that well tested and suffer from bugs.

cl3misch · a year ago

> what if we just generate all products from the set of numbers 2:n and exclude those as "not prime" from all the numbers up to n?

It's fun to translate terse APL to somewhat terse numpy. The result still can be very compact and you can parse it easily if you're used to looking at numpy:

    s = arange(2, 50); p = outer(s, s).ravel(); sorted(set(s) - set(p))

What's interesting there is that numpy is inspired (more than a little) by APL and aims to bring that 'array' thinking to python. I agree that thinking in this 'array' way helps to better construct a solution in any language, so I'm leaning towards 'designing' with APL glyphs, even if that's not the language I'm implementing the thing in.

If it takes any inspiration from APL, it would be mostly indirect, via Matlab.

Analogous code in J,

  /:~ s -. p [ p =: s*/s [ s=: 2+i.48

An exercise for numpy, test that GCD(x,y) * LCM(x,y) = x*y using 1000 random numbers in the range 0..99 for x e y.

  test =:  (* = *. * +.) & ?
  *./ test~  1000 # 100

Thanks, that was fun.

I am not a good golfer. Now I want to look at the codegolf stackexchange for this...

bear8642 · a year ago

Equivalently in APL:

  ⎕io←0               ⍝ to match J
  test ← (× = ∨ × ∧) ∘ ?
  ∧⌿ test⍨ 1000⍴100

Not an array language (AFAIU), but here are some of the mentioned problems solved in (glorious) Factor:

J it's interesting too, without the non-ASCII mess:

https://www.jsoftware.com/indexno.html

https://code.jsoftware.com/wiki/System/Installation <- install

https://code.jsoftware.com/wiki/Guides/Getting_Started <- help

nmz · a year ago

You can ignore this, given that I haven't used either APL/J seriously, but if I were to truly dive in, I'd lean towards APL exactly because of its non-ascii/symbolic leanings. the only similitude I know of is operator overloading, and whenever that is used, I have to relearn what each operator does in a certain context. it is only if you use it regularly like regex which while changing the meaning of the operators, since its an entire DSL, is too different for me to think + means sum. If another entirely different symbol was introduced, then I'm not assigning any functionality to it, which is why I think it should be easier.