Perl Data Language: Scientific Computing with Perl

sivoais · 5 years ago

Hi, PDL core dev here. Feel free to ask me anything about it.

The last release wasn't in February, it was just last week! <https://metacpan.org/release/ETJ/PDL-2.050>.

I agree with many of the commenters here that Python has a lot of great libraries and is a major player for scientific computing these days. I also code in Python from time to time, but I prefer the OO modelling and language flexibility features of Perl.

Speaking for myself and not the other PDL devs, I don't think this is an issue for Perl-using scientists as Perl can actually call Python code quite easily using Inline::Python. In the future I will be working on interoperability between the two better specifically for NumPy / Pandas. This is also the path being taken by Julia and R.

enriquto · 5 years ago

Looks great! I used perl a lot when I started programming and it is lovely to see it alive and kicking with scientific computing!

As a "heavy" user of scientific computing, I must say that the name "data language" is a bit disheartening... It echoes of useless "data frames" not of cool "sparse matrices" which is what I actually need. Does PDS support large sparse matrices? I grepped around the tutorial and the book and the word "sparse" is nowhere to be found. Yet it is an essential data structure in scientific computation. Are there any plans to, e.g., provide an interface into standard libraries like suitesparse?

sivoais · 5 years ago

I plan to improve that, but will need to figure out the design (perhaps with something from Eigen). There is <https://metacpan.org/pod/PDL::CCS>, but it is not a real full PDL ndarray and is actually a wrapper around the PDL API.

1996 · 5 years ago

Very interesting, thank you!

Do you have a tutorial and some examples? If not, could you write one?

I sometimes deploy perl code at large scale for financial computing where only performance matters: with XS the overhead is low while gaining language flexibility.

Even in 2021, this is usually faster than alternatives by orders of magnitude.

PDL could be a good addition to our toolset for specific workloads.

sivoais · 5 years ago

Here is a link to the PDL book <http://pdl.perl.org/content/pdl-book-toc.html>.

I can share some examples of using PDL:

- Demos of basic usage <https://metacpan.org/release/ETJ/PDL-2.050/source/Demos/Gene...>

- Image analysis <https://nbviewer.ipython.org/github/zmughal/zmughal-iperl-no...> (I am also the author of IPerl, so if you have questions about it, let me know. My top priority with IPerl right now is to make it easy to install.)

- Physics calculations <https://github.com/wlmb/Photonic>

- Access to GSL functions for integration and statistics (with comparisons to SciPy and R): <https://gist.github.com/zmughal/fd79961a166d653a7316aef2f010...>. Note how PDL can take an array of values as input (which gets promoted into a PDL of type double) and then returns a PDL of type double of the same size. The values of that original array are processed entirely in C once they get converted to a PDL.

- Example of using Gnuplot <https://github.com/PDLPorters/PDL-Graphics-Gnuplot/blob/mast...>.

---

Just to give a summary of how PDL works relative to XS:

PDL allows for creating numeric ndarrays of any number of dimension of a specific type (e.g., byte, float, double, complex double) that can be operated on by generalized functions. These functions are compiled using a DSL called PP that generates multiple XS functions by taking a signature that defines the number of dimensions that the function operates over for each input/output variable and adding loops around it. These loops are quite flexible and can be made to work in-place so that no temporary arrays are created (also allows for doing pre-allocation). The loops will run multiple times over that same piece of memory --- this is still fast unless you have many small computations.

And if you do have many small computations, the PP DSL is available for the user to use as well so if they need to take a specific PDL computation written in Perl, they can translate the innermost loop into C and then it can do the whole computation in one loop (a faster data access pattern). There is a book for that as well called "Practical Magick with C, PDL, and PDL::PP -- a guide to compiled add-ons for PDL" <https://arxiv.org/abs/1702.07753>.

---

I'm also active on the `#pdl` IRC channel on <https://www.irc.perl.org/>, so feel free to drop by.

zengargoyle · 5 years ago

Now you just need to port it to Raku. (Maybe you have).

sivoais · 5 years ago

I would really like to do some scientific computing in Raku. It has crossed my mind that I can maintain both Perl5 and Raku ports of some of the library code I'm writing. I just haven't worked through the tooling.

audit · 5 years ago

Thank you for your work. I used PDL early 2000 when working in bioinformatics area.

I did not know at the time any of the specialized languages, so intially approaching the project -- I was very concerned on how to deal with matrices, but as I got to understand the PDL better -- i was getting better and better at it.

If I may suggest someting (this is based on the old experience though) --

a) some 'built-in' way to seamlessly distribute work across processes and machines.

b) some seamless excel and libreoffice calc integration.

Meaning that I should be able to 'release' my programs as Excel/Libre Office files.

Where I code in PDL but leverage Spreadsheet as a 'UI' + calc runtime.

So that when I run my 'make' I get out a Excel/Libre office file that I can version and distribute into user or subsequent compute environments.

Where the PDL code is translated into the runtime understood by the spreadsheet engine.

I know this is a lot to ask, and may be not in the direction you are going, but wanted to mention still.

sivoais · 5 years ago

Good ideas!

a)

A built-in way would be good. There is some work being explored in using OpenMP with Perl/PDL to get some of that. In the mean time, there is MCE which does distribute across processes and there are examples of using this with PDL <https://github.com/marioroy/mce-cookbook#sharing-perl-data-l...>, but I have not had an opportunity to use it.

b)

Output for a spreadsheet would be difficult if I understand the problem correctly. This would more about creating a mapping of PDL function names to spreadsheet function names --- not all PDL functions exist in spreadsheet languages. It might be possible to embed or do IPC with a Perl interpreter like <https://www.pyxll.com/>, but I don't know about how easy that would be to deploy when distributing to users.

Am I understanding correctly?

Interestingly enough, creating a mapping of PDL functions would be useful for other reasons, so the first part might be possible, but the code might need to be written in a certain way that makes writing the dataflow between cells easier.

aduitsis · 5 years ago

I can see in the page that last PDL release was on February, okay. Probably the presence of activity indicates that there are people using PDL, and more than that, maintaining it. Since PDL existed for many years, I would speculate that these people presumably didn't jump into PDL yesterday. There can very well be large codebases using it for a long time.

All the comments suggesting that this somehow shouldn't be and that people should move away from PDL, are depressing in the way that, in a split second, years of effort and thousands lines of code that are probably doing good work are dismissed just like that.

If a bank is still using Cobol, it is interesting and a testament to how a Cobol programmer can still make a good living on it. But if a scientist is using Perl to carry out calculations, this is somehow bad?

ajsnigrutin · 5 years ago

This is the sentiment I've been seeing online, especially here.

1/3 of the posts here are "<old tool that already exists in a stable, mature codebase> in {rust|go|whatevernewlanguagecomesnextweek} released v0.0.1"

Perl is a great language, that does it's job for many, many things, especially with CPAN, and it has been doing so for years. You can buy a 20yo book on perl, and 99.99% of the example code from that book still works, and same goes for projects from that era (which cannot be said for python, where developers and distro mainanters seem to enjoy removing usable, mature projects, just because they're written for python2.7 and incompatible with 3+).

If I have to write a script once, that I can forget about, and just expect it to run for years, perl will always be my first choice.

anthk · 5 years ago

>99.99% of the example code

Orelly's Perl books from the CD bookshelf still work.

Just declare a variable with "my =" in front of it (just once), and everything will work as usual:

old:

    $num = 3;
    print $num;

new:

    my $num = 3;
    print $num;

ocschwar · 5 years ago

According to Google, a one off I wrote in Perl in 1998 is still in use at the lab I wrote it in.

lmm · 5 years ago

> If a bank is still using Cobol, it is interesting and a testament to how a Cobol programmer can still make a good living on it.

No it isn't. It's a testament to how backward that bank is. You'll see upvoted contrary takes here, sure, but that's because middlebrow contrarianism is a good way to get upvoted on HN.

ocschwar · 5 years ago

I loved using Perl for projects in the early days of the Web. For anything even remotely expressive or artistic, Perl was the way to go. But if you want to communicate scientific insight, using the write-only language is something I have to maintain my doubts. But, if Inline::Python works as well as the comments above indicate, then I might be tempted again. Number crunching in Python, pretty pictures and presentation in Perl.. Hmm.

worik · 5 years ago

How is Perl write only and Python not?

I am biased. I love Perl and hate Python. Makes me feel very old....

TurboHaskal · 5 years ago

What can I say, Perlphobia is real.

stabbles · 5 years ago

It's unlikely to perform well, so it may not be the right tool for the job.

natch · 5 years ago

Can't speak to this library but in general Perl stands out as being extremely performant, if that is the goalpost you want to go with.

Deleted Comment

MontyCarloHall · 5 years ago

> If a bank is still using Cobol, it is interesting and a testament to how a Cobol programmer can still make a good living on it. But if a scientist is using Perl to carry out calculations, this is somehow bad?

Yes. The negative externalities of doing scientific analyses in Perl are much greater than a bank having a legacy COBOL codebase. Only a handful of engineers within the bank will ever see that COBOL codebase. Science is globally collaborative; many people at many different institutions across the world would have to deal with some idiosyncratic scientist’s decision to write their analysis in Perl.

Also, the bank only has a COBOL codebase because it’s reluctant to make major changes to an extremely important system that’s been working flawlessly since the early 60s. There’s absolutely no reason to start a totally brand new project in Perl (or COBOL, for that matter), when far superior alternatives exist.

nanis · 5 years ago

> idiosyncratic scientist’s decision to write their analysis in Perl.

In my experience, looking at "scientist" produced code, the programming language matters very little. It is not hard to produce something completely inscrutable and non-replicable in Python and R the same way it's been done for ages using SAS, Stata, MatLab etc.

I still see people rolling out their regression computations using matrix inversion and calculating averages as `sum(x)/n`.

I really like PDL when I can use it. I have had problems building it from source on Windows in the past, but it is actually a very well thought out library.

Also worth mentioning, you can get a lot of mileage out of GSL[1].

[1]: https://www.gnu.org/software/gsl/

caslon · 5 years ago

Perl is shipped in virtually every Linux distribution; it's much closer to standard than, say, Node.

worik · 5 years ago

"The negative externalities of doing scientific analyses in Perl"

What do you mean?

sega_sai · 5 years ago

I'm afraid this is ~ 10-15 years too late. I've used PDL somewhat in ~ 2005-2010 when python didn't have that many packages/numpy, and it did the job and could substitute IDL to some degree. But realistically, right now I don't see any reason to use PDL instead of Python.

bjornjajayaja · 5 years ago

Meh… Python zealots abound. Folks often say the same thing about Fortran but frankly, those are the engines that power ALL scripting languages math features. You’re running on C/C++/FORTRAN. If one needs better text-parsing features on top of that: use Perl; if you use a Python environment: use Python.

Perl is superior in many ways and I think using it for data-exploration still has tons of merit.

Folks “shoot themselves in the foot” when they don’t understand the language. In Perls case: list/scalar context is usually the culprit, but it is quite easy to understand. It’s more flexible and concise which in many ways makes Perl better at exploratory programming.

sega_sai · 5 years ago

I (and many in science) switched from C/PDL/IDL to Python in last 10 years because it was is the best tool, not for zealotry.

Python vs Perl for science has nothing to do with c/c++/fortran. C/C++/fortran have their own place for science computations. Perl does not. For anything involving any kind of numerics Python will be faster/better tested/will have more libraries/will have better visualisation capabilities, so there is no need for perl.

Sure if you are only working with strings, you can use perl, but that's hardly scientific computing (not at least the field that I work in).

mr_toad · 5 years ago

> In Perls case: list/scalar context is usually the culprit

That’s just the surface. Working with matrices means nested data structures, and Perls syntax for nested data structures makes the list/scalar context look like child’s play.

7thaccount · 5 years ago

The question isn't about Perl vs Python (which I still think is a clear winner for Python, although there are some uses where I could perfectly understand reaching for Perl first).

The question is whether Perl's PDL is better than ScyPy/Numpy/Pandas/Cython, the Spyder IDE, bunch of plotting libraries...etc.

mangecoeur · 5 years ago

People like other people using python because it makes it easier to share and exchange and built a community of knowledge and libraries. Science is a lot about ease of collaboration.

dolodoot · 5 years ago

> right now I don't see any reason to use PDL instead of Python.

I like python a lot and actually i wish less scientists used it. Perl/PDL is in my opinion much better suited to this (understandable) just getting the job done approach i often find in sciene or other areas in which writing software is not the primary goal.

Deleted Comment

flobosg · 5 years ago

Somewhat related, “How Perl Saved the Human Genome Project (1996)” : https://www.foo.be/docs/tpj/issues/vol1_2/tpj0102-0001.html

hoytech · 5 years ago

I really like PDL, even though I don't get to use it much anymore. Here's a fun application, FM radio demodulation in a page or so of pretty straightforward code:

https://hoytech.com/talks/fm-demodulation

anonu · 5 years ago

I think Perl missed the boat... too little way way too late.

Also - they can't even get their SSL certs straight.

I loved Perl - it was certainly my gateway to coding. Regex was beautiful in perl. But I shot myself in the foot so many times with that language.

hatsuseno · 5 years ago

Specifically how can't Perl get their SSL certs straight?

Dead Comment

dima55 · 5 years ago

I used perl and PDL heavily, before moving to Python and numpy. Both have annoying issues, and oddly, their warts are complementary. Particularly, the core API in PDL is miles better than numpy's. Before I could tolerate actually using numpy, I had to write a library to patch away numpy's warts, by effectively writing a PDL compatibility layer. Check it out:

https://github.com/dkogan/numpysane/

Now the core numpy has usable broadcasting, concatenation and basic linear algebra. Kudos to the PDL team for the excellent core design.

denimboy · 5 years ago

See also integration between PDL and MXNet deep learning framework:

   https://mxnet.apache.org/versions/1.8.0/api/perl