I just find reproducible notebooks at the internet. It is really rare to find them from coworkers. If they aren't trained as developers, it is almost impossible. Their solution for this problem looks really efficient and is really simple and brilliant:
> Writing Polynote’s code interpretation from scratch allowed us to do away with this global, mutable state. By keeping track of the variables defined in each cell, Polynote constructs the input state for a given cell based on the cells that have run above it. Making the position of a cell important in its execution semantics enforces the principle of least surprise, allowing users to read the notebook from top to bottom. It ensures reproducibility by making it far more likely that running the notebook sequentially will work.
Thanks for the kind feedback. It's a young project to be honest, but I'm pretty proud of what we've done with only two contributors so far. With community participation I think we could support many more languages pretty quickly!
I really have always wished for reproducibility. Thanks for taking up this feature. How do you handle aliasing and references inside objects? Suppose I have
#Cell 1
a = [1,2,3]
b = (a,True)
#Cell 2
b[0][0] = 5
#Cell 3
print(sum(a))
Now if I change Cell 2 to
# Cell 2'
b[0][0] = 4
and execute, Cell 3's result becomes stale. Do you track such dependencies? Would really love to read more about the underlying implementation.
According to the article, the most interesting feature compared to Jupyter is no hidden state - if you delete a cell, the variables it set are gone. Also, you can mix languages - you'll be able to access variables filled by prevously executed cells in another language.
Personally, I'm looking forward to trying out the SQL support. I haven't seen an elegant solution for SQL notebooks in Jupyter, it was always second-class via Python or some such. Or have I missed something?
Interesting. Judging by that it seems to be implemented with a JVM language and a screenshot shows "Scala" as a supported language, I'm guessing at least all the JVM languages are supported (personally hope for Clojure) but can't seem to find a list of supported languages anywhere in the post or on the website.
Currently just Scala and Python (via jep). Looking to add more (probably starting with Java and clojure) but haven't had time yet. There's just two of us working on it so far. PRs welcome!
The SQL support is done through Spark, so it's not particularly novel – Zeppelin for example supports SQL similarly. We've talked about adding a more general SQL interpreter, though. Happy to hear any suggestions about it!
Do you know of any generalized SQL interpreter that allows push-downs to the underlying engine where possible, but can also arbitrate compute resources to post-push down operations. Eg: such as merging disparate result-sets or make up for the lack of features from the underlying engines.
Closest thing that comes to mind is something like Apache Drill, which coincidentally also uses Apache Calcite as the SQL interpreter.
Also wondering why I would use this over Zeppelin which can support other interpreters like Flink?
I like this as a concept, but the JDK / jep requirements are a bit of a turn off, personally... I understand they want it to speak Spark but that's not exactly how I would imagine it worked from the name or the "polyglot notebook" description
While the reproducibility problem is definitely a issue, I'm not sure it's such a big issue that I'd switch to a whole different notebook solution for it. For most notebook scenarios, running from scratch works fine to ensure it reproduces. Apart from this one feature, BeakerX does all the same things and fits a lot better into the existing jupyter ecosystem.
To be clear, we're not out to supplant Jupyter. Anybody who's happy with their Jupyter setup will likely find little value in Polynote. But it has plugged some gaps we've had in our Scala ML research team at Netflix, so we thought others might see some value as well.
Somewhat off-topic, but what's with the lambda replacing the "n" letter? I'm no expert in Greek but I thought lambda was the equivalent to the letter "l"...
The logo was hastily designed by an amateur (me). I figured most people would figure it out, pedantic people would complain, and we'd all have a good time :)
We've had some better options contributed in the past couple of weeks, but as long as we're going to change it I didn't want to rush that. So we stuck with my questionable typographic treatment for the blog post.
See, we tried that, and to me it just looked like "ponynote". So far everyone who's mentioned "polylote" has been a current or former physicist, so maybe there's an interesting correlation there...
It does! Monaco is one of the many awesome open source libraries that made Polynote possible. We'll be discussing that at Scale by the Bay; check out our talk if you're going!
It seems like the tool was mainly invented to deal with the issue of hidden state in notebooks, but I don't honestly see what the big deal is. Jupyter notebook is a tool with hidden state being a gotcha that you can learn how to deal with extremely quickly. I've been a Jupyter notebook for several years so haven't had this problem often in recent memory, but I've led workshops where we teach users how to use the notebook. Inevitably hidden state issues come up, but students very quickly learn that restarting the kernel is a necessary part of the workflow and figure out when they need to do it.
> Writing Polynote’s code interpretation from scratch allowed us to do away with this global, mutable state. By keeping track of the variables defined in each cell, Polynote constructs the input state for a given cell based on the cells that have run above it. Making the position of a cell important in its execution semantics enforces the principle of least surprise, allowing users to read the notebook from top to bottom. It ensures reproducibility by making it far more likely that running the notebook sequentially will work.
https://nbviewer.jupyter.org/github/friggeri/notebooks/blob/...
https://github.com/jupytercalpoly/reactivepy
https://dataflownb.github.io/
https://github.com/stitchfix/nodebook
Personally, I'm looking forward to trying out the SQL support. I haven't seen an elegant solution for SQL notebooks in Jupyter, it was always second-class via Python or some such. Or have I missed something?
Interesting. Judging by that it seems to be implemented with a JVM language and a screenshot shows "Scala" as a supported language, I'm guessing at least all the JVM languages are supported (personally hope for Clojure) but can't seem to find a list of supported languages anywhere in the post or on the website.
What languages are supported by Polynote?
https://github.com/polynote/polynote/blob/08f0751138e2991cf7...
Closest thing that comes to mind is something like Apache Drill, which coincidentally also uses Apache Calcite as the SQL interpreter.
Also wondering why I would use this over Zeppelin which can support other interpreters like Flink?
As ever, the best answer is the Notebooks Are Bad, Actually
We've had some better options contributed in the past couple of weeks, but as long as we're going to change it I didn't want to rush that. So we stuck with my questionable typographic treatment for the blog post.
(Edit: autocorrect typo)
Deleted Comment