JuliaCon2020: Julia Is Production Ready

I've been thinking about jumping on the Julia ship for a while. I like the LISP-like nature of R and miss its flexibility very much while working in Python (a language I also enjoy, but not particularly for data science). Julia, specifically its macro story and speed, fascinate me.

One of the things that still keeps me at bay is the JIT and the pre-compilation in general. It seems like it's still not very easy to actually compile a Julia library or an executable. There is PackageCompiler.jl, but my understanding is that it necessitates pulling in sys.so, which is some 130 MB in size, making it prohibitive for many projects.

ViralBShah · 6 years ago

We are always working on streamlining PackageCompiler.jl and building binaries. Bug reports always appreciated.

I'm curious to understand why the 130MB is prohibitive. Are you looking at embedded or resource constrained environments?

peatmoss · 6 years ago

I know for me the documentation and general process of using PackageCompiler.jl is a little daunting for me in a way that other parts of the ecosystem and docs haven’t been.

I was just trying to do a quick test app the other day to see if I could build a Hello World app, and (this may have been my fault) but I hit the threshold of cognitive overhead before deciding to revisit later. This isn’t a criticism so much as a recognition that compiling binaries may take a bit more care / attention / learning in Julia than something like Go, Rust, or even Racket.

As for binary sizes, for some serverless deployments smaller binaries can open up a little easier path.

I’ll try and take another run at PackageCompiler and give it a more fair shake. If in that process I find anything constructive to report or submit documentation PRs, I will do so.

Thank you for your work on Julia. Julia was on my radar a few years ago and then dropped off again. I’m excited to see where it has come in that time.

luizfelberti · 6 years ago

In a few use cases, such as big-data processing and things like that, a common pattern is to haul a tiny data-processing kernel to several machines to do an operation on some data they hold, rather than shuffle the data around.

This pattern can be seen in several places and frameworks like Spark & company, and it's also pretty much the foundational idea behind processing data with Joyent's Manta[0]. It always pretty much boils down to "serialize a function, and send it somewhere, to run on the data locally".

Some of these frameworks do it only through native functions (Spark runs serialized JVM procedures, and for Python functions I believe it shells out to an actualy Python interpretes), others are completely agnostic to this and work with binaries (Manta).

It'd be pretty hard to do, and have a lot of overhead, if all of those "functions" that I'm shuffling around are, at the lower bound, 130MB in size.

I understand that a lot of this can be done directly at the Julia layer for Julia things (much like Spark does with JVM things), but it kinda limits the applications of this nonetheless.

[0] https://github.com/joyent/manta

lordgroff · 6 years ago

It's more generic than that. For basic general purpose computing with some calculation and where performance is of some importance, I can design and ship the whole thing via something like Lazarus in 5-10 MB, or -- from my understanding -- 130+ MB in Julia. I understand that this might be just asking Julia to break out a bit too far out of its intended use case, but it would be nice.

oblio · 6 years ago

I don't know about him, but I come from a somewhat weird angle, I'd want to use Julia to replace Python for command line, Opsy/DevOps type tools. Go has quite a few things I'm not a big fan of, at the moment, so Julia seems the perfect candidate. But right now its deployment situation is too complicated, I just want a static binary or a self contained zip.

drewm1980 · 6 years ago

From the Package.jl description: "Creating custom sysimages for reduced latency when working locally with packages that has a high startup time." That sounds really far from statically compiling everything with a compiler that will strip out all the code you don't code, inline aggressively, etc... Static languages at least let us ~pretend that even if our dependencies are bloated, our binaries will be elegant tightly optimized things :)

caleb-allen · 6 years ago

In a recent talk about PackageCompiler, the speaker said that an image compressed into a .tar.gz is ~80 MB.

While being a big fan of Julia in general and recognizing a very significant progress in both language and ecosystem development [kudos to Julia's core team as well as numerous contributors to the language and its growing ecosystem!], I respectfully disagree with the post's author conclusion in its entirety.

For scientific computing and advanced machine learning domains, Julia is certainly a breath of fresh air and a no-brainer decision. However, as a general-purpose language as well as technology stack, I think Julia has still a long way to go. This is mostly due to relative immaturity of that part of the ecosystem, spotty - and sometimes even non-existent! - documentation (of course, except for the language itself and quite a limited number of core and popular packages) as well as some other issues, including tooling, development/compilation performance and limited pool of skilled developers.

So, from a startup founder perspective [who has to select the optimal, risk-minimizing, platform stack], despite Julia's many attractive features (including powerful meta-programming facilities - in my case, for potential DSL development), I'm now leaning toward Python and .NET ecosystems. Both offer very mature and large package ecosystems along with comprehensive tooling support and incomparably larger pool of skilled developers. Additionally, .NET offers stability of consistent development improvements, backing of a major corporation [no acquisition risk] along with support for modern enterprise-focused architectural practices and patterns, including DDD and multi-tenancy.

avasthe · 6 years ago

This.

Not trolling. I don't really know why people like to shoehorn a purpose-oriented language into all domains. Eg: Julia for scientific computing, Rust for low level programming. They would be great if they focused on those "strong zones".

ddragon · 6 years ago

I read this article not as a "Julia is ready to replace the all-purpose languages used in business", but as "Julia is ready to deploy it's scientific computing into production environments" (as opposed to just local/hobby tasks and academic environments which were the early focus). No one is recommending people to do their small e-commerce in Julia (or perhaps Rust) unless you're in for the fun of the languages (or your e-commerce happens to be a special case that will play with either language's strengths). But having a solid web deployment story is important to both regardless, be it in Julia to have live dashboards or integrating your scientific models in your microservice architecture and to serve them directly to clients or Rust for embedded web servers for IoT and other devices. For a language to be great in one domain, it has to be at least good in everything around it for when you need to connect that domain to the real world.

And Julia creators do focus on it's strong zone, with all the works on TPU, HPC and parallel computing, and so does the community with the stuff presented in juliacon like interactive reproducible notebooks (Pluto.jl) or data dashboards (Stipple.jl), both using the web domain to improve the scientific computing domain.

pjmlp · 6 years ago

Apparently many have issues being polyglot and rather use an hammer for all kinds of nails, being an Tech X Developer.

oxinabox · 6 years ago

Invenia had run Julia for our primary production system for over a year (nearly 2).

The size of our Julia codebase is about 400,000 lines of code, spread over numerous internal and open source packages. You can cite the "we're hiring" slide from my Juliacon talk for that https://raw.githack.com/oxinabox/ChainRulesJuliaCon2020/main...

nickjj · 6 years ago

I'd be super interested in having you (or someone at Invenia) on as a guest on a podcast that talks about tech stacks. We'd cover things like what development and deployment is like, which tools you use and why you chose them, etc..

At the moment there's no Julia episodes yet. If you want to come on, head over to https://runninginproduction.com/ and click the "become a guest" button near the top right to get the ball rolling.

LolWolf · 6 years ago

Oh, that would be awesome! I would really love to listen to a bit more detail about the choice of Julia for a company's stack. (I am an academic, so choosing Julia was very easy for me :) but I can imagine this choice could likely be harder outside of basic research.)

7thaccount · 6 years ago

As someone working in power systems (electrical engineer) and who programs in Julia for fun a bit. I'm curious about y'all's business model. I've asked before and only gotten something about load forecasting which I can't see supporting 30 people. Can you shed any light on the business model? Contract work? Do y'all have an actual software product? Please note that I'm not being critical, I am legitimately curious as the work y'all do seems to be very academic which I tend to see more coming out of universities. 400k loc is a lot btw, so it seems like y'all are pretty busy!

The short answer is that you are underestimating the scale. Invenia operates over most of grids in the continental US. Think less "helping one powerplant how much to generate, and thus saving them money on wasted fuel if not needed." And more "helping thousands...", we're more linked to the system operators than the individual plants. And the systems move huge amount of power each day, so small percentage efficiencies is still a big number dollar wise

literallycancer · 6 years ago

You guys should look into the font size and text colors (contrast) on your website. It's really hard to read.

agumonkey · 6 years ago

Was it a new project using Julia from scratch ?

3JPLW · 6 years ago

Not an Invenian, but my understanding is that they gradually transitioned away from a mix of MATLAB, Python, and C.

https://juliacomputing.com/case-studies/invenia.html

People often ask me for examples of non-scientific Julia codebases. My favourite one is Franklin.jl (https://franklinjl.org), a static site generator. This blog discusses quite a bit more, and it doesn't even talk about all the improvements in integrating with databases and such.

StefanKarpinski · 6 years ago

Another example is that all the infrastructure for serving Julia packages to Julia users in the new 1.5 release is implemented in Julia. This is a content distribution network served by highly concurrent servers over HTTPS. The systems interacts with git servers, GitHub APIs and AWS S3 among many other moving pieces. Code can be found here: https://github.com/JuliaPackaging/PkgServer.jl.

Writing servers in Julia is really pleasant thanks to the clean coroutine-based task model. Under the hood, Julia uses libuv to get efficient high-concurrency I/O with an event loop. But from the user's perspective, you just write simple blocking code and use `@sync` and `@async` to spawn concurrent tasks. If you want to use multiple threads, just use `@spawn` instead of `@async`. This is very similar to Go's highly successful I/O and concurrency model.

That's a great example. Also, JuliaBox.com (discontinued service) ran on Julia in production for 3 years, and has now been replaced by JuliaHub.com, which is also largely written in Julia.

Several Julia Computing customers also run long running Julia jobs in production, and some are discussed in our Case Studies (https://juliacomputing.com/case-studies/).

bransonf · 6 years ago

My favorite is Genie.

[0] https://genieframework.com/

adamnemecek · 6 years ago

Julia is next gen. The whole experience is so much more pleasant than anything in the space. The interop is also crazy. How is it possible that you can call Matlab, Python, cpp, fortran, Rust from one language?

I legit think that even if you are using say pytorch. You are better off writing your code in julia and using the python interop.

amkkma · 6 years ago

Using pytorch/keras/TF from Julia is possible, and even pleasant: https://twitter.com/aureliengeron/status/1277751121440698368

Though I much prefer Flux.jl

UncleOxidant · 6 years ago

In my experience Flux.jl has a ways to go before it reaches Pytorch performance. Apparently this has something to do with some very efficient GPU kernels that Pytorch uses - I wonder if it's possible for Flux to "borrow" those GPU kernels? It's possible to write GPU kernels in Julia so hopefully this will help close the gap.

smabie · 6 years ago

Yeah, the quality of the language is pretty incredible. Especially compared to Python, which has been a disaster for as long as I can remember.

I think the ecosystem could be a bit better, but that's more of a personal opinion rather than a serious concern as, like you alluded to, you the interop story is insanely good.

I think Python is less a disaster than it is a tool being used for something it was not designed to do.

It's a pretty good scripting language, and it's a "good enough" interface that implementers of C packages can expose powerful features, and that can only go so far. User's can't easily compose the powerful features that library designers didn't intent.

Julia, on the other hand, was designed for the explicit purpose of high-performance scientific computing, with a syntax informed by the tools people already knew (python, MATLAB). It's no wonder it's so much easier to use.

The ecosystem is moving very rapidly though. What do you think is missing?

ablekh · 6 years ago

andi999 · 6 years ago

I can currently see Julia only as a replacement for Python, with the biggest advantage, that fast modules can be written in-language, and you do not default back to C. So in a sense it could solve the two language problem.

But then for my applicationsspace there are problems:

- compilation seems possible now, but one has to be careful of GPL (like in the fft)

- no way to turn off GC (so byebye real-time possibilities).

- what about GUI design? Heard some mixed messages about it

eigenspace · 6 years ago

> - no way to turn off GC (so byebye real-time possibilities).

While Julia does not offer semantic guarantees that you can avoid the GC, it;s actually quite possible and easy to write code where you're manually managing all your memory and the GC is never invoked.

> - what about GUI design? Heard some mixed messages about it

There's a lot of promising things happening with Makie.jl, Stipple.jl, Dash.jl, Pluto.jl and others. This space is still a little immature in Julia, but it's progressing fast.

ChrisRackauckas · 6 years ago

JuliaRobotics is a pretty good example of this: they eliminated all GC calls from the loops in order to deploy on an Atlas robot and... that was 2018.

I know there are workarounds, but when you do not have guarantees the phrase 'production ready'is not the first that comes to my mind. (Of course in the problem spaces of matlab and python, and probably R, Julia is probably on par)

The FFTW package is no longer included in the stdlib. The only GPL dependency that is shipped in there is SuiteSparse.

So what non gpl fft can you use? (If your program needs one and you do not want gpl or pay for fftw)

I'm also interested in the packaging/deployment story. Can you easily make a zip with everything built-in and 0 OS dependencies? Can you ship a single, statically linked binary?

dnautics · 6 years ago

Having followed it since the 0.4 era I'm excited that Julia is production-ready (though I have professionally moved to elixir). Since I'm working in ml infra, I'm interested in orchestrating Julia jobs (via Elixir), what I'm less clear on is if there are good resources on how to deploy Julia, keep dependencies sane, pin memory usage. Or does one just go straight to containers?

staticfloat · 6 years ago

Yep, to build on Stefan's sibling comment, we deploy a few services through docker that follow the flow of <merge on github> -> <build on dockerhub> -> <deploy through watchtower> that has been working for us really well.

Docker has some nice infrastructure that we like, such as the ability to set memory limits, automatically restart processes on crash/if a healthcheck goes sour/on reboot, easily spin up different versions of the same service (based on which image you tag to a name), etc... Containerization has done a lot for our ability to bring servers up onto heterogenous hardware easily and quickly.

As an example, this repository [0] is deployed on 10 machines around the world, providing the pkg servers that users connect to in geographically-disparate regions. The README on that repository is a small rant that I wrote a while back that walks through some of the decisions made in that repository and why I feel they're good ones for deployment.

To answer your specific questions:

* How to deploy Julia: we use docker containers and watchtower [1] to automatically deploy new versions of Julia.

* Keep dependencies sane: it's all in the Pkg Manifests. We never do `Pkg.update()` on our deployments, we only ever `Pkg.instantiate()`.

* Pin memory usage: We use docker to apply cgroup limits to kill processes with runaway memory usage [2]. This is only triggered by bugs (normal program execution does not expect to ever hit this) but bugs happen, so it's good to not have to restart your VM because you can't start a new ssh session without your shell triggering the OOM killer. ;)

[0] https://github.com/JuliaPackaging/PkgServerS3Mirror [1] https://github.com/containrrr/watchtower [2] https://github.com/JuliaPackaging/PkgServerS3Mirror/blob/c6a...

With a bit more detail, this would make a great blog post!

thank you very much for the tips!

For services we deploy at Julia Computing, we use docker containers since that's just the way devops is done these days, but we also use manifest files to pin all our dependencies which almost makes dockerizing things redundant. It doesn't hurt to have a couple of layers ensuring reproducibility.

Thanks! I'm also very excited by pluto.jl, looks fantastic. I'll probably be offering public machine learning vms with Pluto builtin by the end of the year.

amacbride · 6 years ago

I really enjoyed the workshops and the talks; I thought the SciML and MLJ talks were particularly good.

Everything is available on their YouTube channel:

https://www.youtube.com/playlist?list=PLP8iPy9hna6Tl2UHTrm4j...