KlongPy: High-Performance Array Programming in Python

I’ve tried several times to give J/k/Q/kdb a chance but I haven’t really found a convincing example why this approach is better than say SQL or say numpy/jax etc.

The syntax has the same problem as perl in that you have to learn too many symbols that are hard to look up. And this combined with the tacit style makes it difficult to parse what the code is doing.

I think ruby went in the radically opposite direction in that it provides a similar feature set to perl with similar functional features but everything is human readable text. I feel like there’s room for a human readable version of J that relies less on syntactical sugar.

I do think the direction Klong has gone with the syntax is an improvement: https://t3x.org/klong/ambiguity.html

I’m curious if anyone has a go-to article/example of why an array language would be a good choice for a particular problem over other languages?

I know the second chapter of J for C programmers is kinda like that but I didn’t really find it convincing: https://www.jsoftware.com/help/jforc/culture_shock.htm#_Toc1...

082349872349872 · a year ago

The biggest advantage of the terse symbolic style for me comes when you wish to rapidly compare different ways (algorithms) to compute the same function. For instance, Kadane's Algorithm usually takes a dozen+ lines in an algoly language, which would normally be presented in some kind of syntax-coloured code block external to descriptive paragraphs; it takes a dozen- characters in an array language, and therefore variations can be included inline to text describing the alternatives.

RyanHamilton · a year ago

https://www.timestored.com/b/kdb-qsql-query-vs-sql/ How many database tables have a date time column and a natural ordering? Most the data I look at. Which makes it crazy that sql is based on unordered sets.

faizshah · a year ago

Thanks for the link I think this is a really interesting example:

> In qSQL this is: aj[`sym`time; t; q], which means perform an asof-join on t, looking up the nearest match from table q based on the sym and time column.

> In standard SQL, again you’ll have difficulty: sql nearest date, sql closest date even just the closest lesser date isn’t elegant. One solution would be:

> WITH cte AS (SELECT t.sym, t.time, q.bid, ROW_NUMBER() OVER (PARTITION BY t.ID, t.time ORDER BY ABS(DATEDIFF(dd, t.time, p.time))) AS rowNum FROM t LEFT JOIN q ON t.sym = q.sym) SELECT sym,time,bid FROM cte WHERE rowNum = 1

> It’s worth pointing out this is one of the queries that is typically extremely slow (minutes) on row-oriented databases compared to column-oriented databases (at most a few seconds).

This is a really nice example but I think it’s more about this as of join being a really useful operation, it appears both pandas https://pandas.pydata.org/docs/reference/api/pandas.merge_as... And duckdb https://duckdb.org/docs/guides/sql_features/asof_join.html

Pandas:

> pd.merge_asof(t, q, on='time', by='sym', direction='backward')

Duckdb:

> SELECT t.sym, t.time, q.value FROM t ASOF JOIN q ON t.sym = q.sym AND t.time >= q.time;

So it seems more like this is benefitting from a useful feature of time series databases rather than the features of an APL-family language.

Personally I find the pandas syntax to be the most straightforward here.

ramses0 · a year ago

There was a step-change improvement for me when I tried expressing some JS patterns via `underscore.js` instead of procedurally: eg: http://underscorejs.org/#each

Thinking of something as `each | map | filter | sum` is waaay less buggy than writing bespoke procedural code to do the same thing. No doubt there is a "cost" to it as well, but the _abstraction_ is valuable.

Now, if there were a "compiler" which could optimize that whole pipeline down and squeeze out the inefficiencies between steps because it could "see" the whole program at the same time. (oh, I don't know, something like `SELECT * FROM foo WHERE a > 100 AND b < 9000 LEFT JOIN bar ON ...etc...`)

...perhaps you could get both an expressivity gain (by using higher level concepts than "for" and "while"), a reduction in bugs (because you're not re-implementing basic work-a-day procedures), and an improvement in efficiency (because the "compiler" can let you express things "verbosely", while it sorts out the details of efficiency gains that would be tenuous to express and keep up to date by hand).

faizshah · a year ago

I 100% agree, I think the functional features that have been added across all the popular languages (map, reduce, fold etc.) has been a positive. Nothing demonstrates it better (imo) than purrr in R: https://github.com/rstudio/cheatsheets/blob/main/purrr.pdf

I also think there is some merit to “high syntactical density” clearly if you can see the entire code in one place instead of having to navigate through many files or sections that’s beneficial. (Heavily discussed in the last big HN thread: https://news.ycombinator.com/item?id=38981639)

I also think JQ has proven the merit of tacit functional languages in that you can concisely write arbitrary transforms/queries on json that can be more expressive than SQL (many SQL engines have added JSONPath anyway). And I also think postfix is great for processing pipelines.

But I am not totally convinced in the approach of APL/J/Q/KDB for the combination of terse style + prefix + tacit because it makes the code so difficult to read. I think if you took an approach similar to JQ where instead of relying on symbols operators were just human readable words it would be easier to get us half way there to trying out the verb, adverbs etc. approach of the APL family. The problem with making it human readable text is that you lose the conciseness which is part of the draw of the APL family as they want to have a high syntax density and analogous code to mathematical expressions.

snthpy · a year ago

That's kind of what we're aiming for with PRQL (prql-lang.org). While currently it only supports SQL backends, I want to generalize that to things like underscore.js.

Deleted Comment

I'm gonna be the one who asks the dumb question, but someone has to do it: why are expressions evaluated from right to left?

abrudz · a year ago

Think "normal" call syntax like

  foo(bar(baz(42)))

and then remove the superfluous parens

  foo bar baz 42

The expression is evaluated from right to left.

Now, let's make two of the functions into object members:

  A.foo(bar(B.baz(42)))

Remove the parens, extracting the methods from their objects, instead feeding each object as a left argument to its former member function:

  A foo bar B baz 42

This is normal APL-style call syntax; right-to-left if you want.

amarcheschi · a year ago

Oh now I see it, it sorts of reminds me of lambda calculus

bear8642 · a year ago

Because that's what APL does: https://www.jsoftware.com/papers/EvalOrder.htm

It also simplifies things as no need to implement BIDMAS

RyanHamilton · a year ago

Thanks for asking. I would guess you're asking as someone that spent years learning math notations and being taught BODMAS operator precedence. The funny thing is that if you took a 3 year old child and taught them right to left it's actually more natural than multiplication before addition. Array languages often take a fresh first principles approach rather than regurgitating common learnings. This does mean programmers from other languages can find it more confusing than total beginners.

nils-m-holm · a year ago

Good to see KlongPy thrive! It is based on Klong (http://t3x.org/klong/), which I do not maintain any more, so I am glad that Brian took over!

wvlia5 · a year ago

Hey, I read your yoga book today.

Good to hear! I do more yoga than programming these days. :)

dang · a year ago

Related. Others?

KlongPy: Vectorized port of Klong array language - https://news.ycombinator.com/item?id=35400742 - April 2023 (8 comments)

Klong: a Simple Array Language - https://news.ycombinator.com/item?id=21854793 - Dec 2019 (73 comments)

Statistics with the array language Klong - https://news.ycombinator.com/item?id=15579024 - Oct 2017 (12 comments)

Klong – a simple array language - https://news.ycombinator.com/item?id=10586872 - Nov 2015 (21 comments)

ktm5j · a year ago

Not everyone has read everything that's ever been posted to HN. There's some utility in reposting, which I think is evident based on how many upvotes this has gotten. I'd never seen this and I'm here almost every day.

If you've already seen this, and aren't interested in another look then just move on.

solumunus · a year ago

They’re posting related discussions so that interested readers can read further.

kstrauser · a year ago

This is awesome. I’ve wanted to play with array languages before but they tend to be kind of a pain in the neck to start with locally. I like the idea of hacking my way around in the new language while keeping a Python escape hatch close by in case I haven’t yet learned how to do a thing the new way.

Nice.

eismcc · a year ago

KlongPy author here: AMA

sevensor · a year ago

Would it make sense to think of this as a compact syntax for numpy? When it comes to array operations, are there differences that go deeper than the syntax?

KlongPy has a lot of other features beyond pure NumPy operations (such as IPC and web server), which you could see as a kind of making use of array operations in some application. You could look at the core Klong language as what you suggest.

tveita · a year ago

Is the full rendered documentation available anywhere?

https://github.com/briangu/klongpy/blob/main/docs/quick-star... links to an "API Reference" and a "REPL Reference", but the links are broken.

Not yet. I want to make an online book.

jayavanth · a year ago

why are so many array programming alternatives to numpy/scipy popping up recently? Is there a fundamental flaw or showstopper in numpy?

Numpy is just the backend and KlongPy has a lot more application features than Numpy. You can see something like kdb+ as inspiration.

Qem · a year ago

On Linux rlwrap is used to get the REPL working. Is possible to get the REPL working under PowerShell in a Windows box too?

I don’t use windows so not sure.

solidsnack9000 · a year ago

I gather where it says `:monad` it is referring to an operation that had an effect on the interpreter state?

Pompidou · a year ago

No. In APL deriverd array programming languages, verbs (or functions) are monadic or dyadic : they accept only one or two arguments :

In '1 + 1', + is a dyadic operator, while in 'exp 5', exp is monadic.

In J, and in APL I guess, left arg is usually understood as 'control data', while right arg is the data upon which calculation is done. Left argument is usually left unchanged after calculations.

In this way, is it possible to create multi-arguments verbs, by placing boxed args on the left of the verb.

Unfortunate naming coincidence which is confusing. Couldn't we just call those unary and binary operators rather?

itishappy · a year ago

It sounds unrelated to the monads popularized by Haskell tutorials. Naming coincidence or am I missing something?

Thanks kindly.

incrudible · a year ago

Naive array programming is not really high performance in my book, because performing a lot of trivial arithmetic on large arrays leads to poor locality and high bandwidth pressure. The better alternative is SPMD, i.e. something like CUDA or ISPC for CPUs. This is possible with some type of JIT if the numpy style of programming is to be maintained, for example tinygrad.

mlochbaum · a year ago

Memory traffic is certainly the biggest problem that the array paradigm presents for implementation, yes. I'd quibble with calling that "poor locality": when working with large arrays, if any part of a cache line is accessed it's very likely the entire line will be used in short order, which is the definition of good locality as I understand it. The issue is simply high memory use and a large number of accesses.

I think it's an oversimplification to say SPMD is a better model full stop. Array programming has the advantage that the implementer optimizes a specific function, and its interaction with the rest of the program is much simpler: all the data is available at the start, and the result can be produced in any order. So things like multi-pass algorithms and temporary lookup tables are possible, particularly valuable if the program isn't spending that much time on arithmetic but rather "heavy" searching and sorting operations. Not that I claim NumPy or CuPy do a good job of this. Sections "Fusion versus fission" and "Dynamic versus static" are relevant here, I think: https://mlochbaum.github.io/BQN/implementation/versusc.html#...

cl3misch · a year ago

If you like high-performance array programming a la "numpy with JIT" I suggest looking at JAX. It's very suitable for general numeric computing (not just ML) and a very mature ecosystem.

https://github.com/jax-ml/jax

CyberDildonics · a year ago

You are absolutely right, naive array programming might be much faster than raw python but it will never be high performance because you can't use caches and memory bandwidth effectively.

bunderbunder · a year ago

The unstated major premise here is that you're working with data that's big enough that this becomes your major bottleneck.

There's also a subset - possibly even a silent majority - of problems where you're doing fairly intensive calculation on data that fairly comfortably fits into a modern CPU cache. For those, I'd still expect SIMD to be a great choice from a performance perspective.