Readit News logoReadit News
spiralk commented on Two new Gemini models, reduced 1.5 Pro pricing, increased rate limits, and more   developers.googleblog.com... · Posted by u/meetpateltech
kendallchuang · a year ago
Thanks for the link. That's unfortunate, though perhaps the benchmarks will be updated after this latest Gemini release. Cursor with Sonnet is great, I'll have to give Aider a try as well.
spiralk · a year ago
It is updated actually, gemini-1.5-pro-002 is this new model.
spiralk commented on Two new Gemini models, reduced 1.5 Pro pricing, increased rate limits, and more   developers.googleblog.com... · Posted by u/meetpateltech
phren0logy · a year ago
Can you link to documentation for Google's LLMs? I searched long and hard when Gemma 2 came out, and all of the LLM offerings seemed specifically exempted. I'd love to know if that has changed.
spiralk commented on Two new Gemini models, reduced 1.5 Pro pricing, increased rate limits, and more   developers.googleblog.com... · Posted by u/meetpateltech
999900000999 · a year ago
Not sure why your getting down voted. Anything sent to an cloud hosted LLM is subject to be publicly released or used in training.

Setting up a local LLM isn't that hard, although I'd probably air gap anything truly sensitive. I like ollama, but it wouldn't surprise me if it's phoning home.

spiralk · a year ago
This is not true. Both OpenAI and Google's LLM APIs have a policy of not using the data sent over them. Its no different than trusting Microsoft's or Google's cloud to store private data.
spiralk commented on Two new Gemini models, reduced 1.5 Pro pricing, increased rate limits, and more   developers.googleblog.com... · Posted by u/meetpateltech
kendallchuang · a year ago
Has anyone used Gemini Code Assist? I'm curious how it compares with Github Copilot and Cursor.
spiralk · a year ago
The Aider leaderboards seem like a good practical test of coding usefulness: https://aider.chat/docs/leaderboards/. I haven't tried Cursor personally but I am finding Aider with Sonnet more useful that Github Copilot and its nice to be able to pick any model API. Eventually even a local model may be viable. This new Gemini model does not rank very high unfortunately.
spiralk commented on How we made Jupyter notebooks load faster   singlestore.com/blog/how-... · Posted by u/lneves12
epistasis · a year ago
> Not having the outputs tied into the code is actually preferable if the ultimate goal is reproducible science.

What a strange thing to assert, especially as a general overarching truth.

The best reports I have ever seen have matched code and output in the same file. There's never a question of what code generated a plot or a table with a notebook.

With .py files and separate outputs there's far more change for unreproducibke science, it's far messier, and for someone who doesn't appear to respect the organizational capabilities of academic labs, you are condemning them to far more poorly organized outputs.

> Having multiple copies of code

That doesn't have anything to do with notebooks. It's as silly as saying that a Python package is a poor idea because you say somebody repeat code across multiple places.

> non-version controlled files

Notebooks are no less version controllable than .py files.

> outputs with timestamps and run information

Jupyter notebooks are perfect for this, far superior to a directory of cryptically named outputs that need to be strung together in some order

> documentation dispersed with questionable organization

Using separate Python files rather than a notebook means that documentation can never be where it needs to be: next to the output. This is one of the ways that Python files are strictly inferior for generating results.

There are roughly two modes for notebooks: exploration with a REPL, and well-documented reports. The best scientific reports I have ever seen are notebooks (or R Markdown output) that are the full report text plus code plus figures.

spiralk · a year ago
> someone who doesn't appear to respect the organizational capabilities of academic labs, you are condemning them to far more poorly organized outputs.

This is not a great way to make your argument, though you are not the not only one here making a personal judgement without even knowing about my background. These are all issues I have seen first hard. With most academic labs being funding limited, the "organizational capabilities of academic labs" seems irrelevant to me. In our field, no one is getting grants to manage code of any kind .py or .ipynb and I suspect its the same at most university labs. It's effort wasted that ultimately does take time away from the actual research that's fundable and publishable. As someone who has been responsible for wrangling people's notebooks in the past, it's enough of a problem that I would encourage to remove all .ipynb.

> That doesn't have anything to do with notebooks. It's as silly as saying that a Python package is a poor idea because you say somebody repeat code across multiple places.

Human factors make jupyter notebooks lead to the problems I have listed. The issues are most apparent with large groups and over long periods of time. Python and other programming languages already solved most of these problems with git. There isn't a tool that is as elegant and scales from individuals to massive organizations.

> There are roughly two modes for notebooks: exploration with a REPL, and well-documented reports. The best scientific reports I have ever seen are notebooks (or R Markdown output) that are the full report text plus code plus figures.

The REPL functionality is handled by .py cell execution, as I’ve mentioned in other comments. It baffles me how the minimal effort saved by not using separate tools -- one for code, one for documentation -- justifies the issues it introduces.

spiralk commented on How we made Jupyter notebooks load faster   singlestore.com/blog/how-... · Posted by u/lneves12
majormajor · a year ago
Having the outputs recorded alongside specific versions of the code can actually be very valuable.

But since most uses of Jupyter notebooks I've seen don't version control them much at all, it's not as useful in practice often.

spiralk · a year ago
Yeah, jupyter notebooks don't guarantee any specifics about versions of code used for that output. In the real world you can expect everyone in the lab including all of the students to be editing jupyter notebooks at whim. The only way to do this would be to have proper version control and of your code, a snapshot of the environment, and to log all this along with the run that generated the output. This is possible with regular python using git, proper log files, etc. Jupyter notebooks seem like an extra roadblock.
spiralk commented on How we made Jupyter notebooks load faster   singlestore.com/blog/how-... · Posted by u/lneves12
KolenCh · a year ago
I don’t disagree anything you said. Jupytext can be a good tool to bridge some gap, where you pair ipynb to a py script and can then commit the py only (git-ignore all ipynb for your collaborators.)

Also, while many practices out there is questionable, in alternative scenarios where ipynb doesn’t exist, they might have been using something like matlab for example. Eg, in my field (physics), often time there are experimentalists doing some coding. Ipynb can be very enabling for them.

I think a piece of research should be broken down and worked by multiple people to improve the state of the project. Some scientists might be passing you the initial prototype in the form of a notebook, and some others should be refactoring to something more suitable for deployment and archival purpose. Properly funding these roles is important, and is lacking but improving (eg hiring RSE.)

In my field, the most prominent way when ipynb is shared a lot is for training. It’s a great application as that becomes literate programming. In this sense notebook is highly underused as literate programming still hasn’t got mainstream.

spiralk · a year ago
I've looked into Jupytext, but ultimately decided to go with pure python. Most of the practical functionality can be replicated, but I do admit there isn't a easy single install tool or guide to replace notebooks at the moment.

I think the notebooks are a fine learning tool to introduce people to programming initially, but I'm afraid it doesn't allow for growth beyond a certain level. You have a good point about funding for those software roles. Perhaps this may not be as big of a concern if there were more software talent in these labs to handle the issues that arise.

spiralk commented on How we made Jupyter notebooks load faster   singlestore.com/blog/how-... · Posted by u/lneves12
Twirrim · a year ago
I use jupyter notebooks at work, not so much for academic stuff, but often to help build and show a narrative to folks, including executives (where I have any even remotely technical leadership). It's great for narrative stuff, especially being able to emit PDFs and what not. I've been in a number of meetings where I've got the code up in Jupyter, sharing the screen, and leadership want us to tweak numbers and see the consequences.

It's great for exploring code and data too, especially situations where I'm really trying to feel my way towards a solution. I get to merrily intermingle rich text narrative and code so I explain how I got to where I got to and can walk people through it (I did that with some experimenting with an SMT solver several months ago, meant that people that had no experience with an SMT solver could understand the model I built).

I'd never use it to share code though. If we get to that stage, it's time to export from jupyter (which it natively supports), and then tidy up the code and productionise it. There's no way jupyter should be the deployed thing.

spiralk · a year ago
That seems like a reasonable way to use jupyter notebooks since you have an actual plan to move beyond it when necessary. My issue is mostly with the way its misused, often by people who are arguably at the top of the field.
spiralk commented on How we made Jupyter notebooks load faster   singlestore.com/blog/how-... · Posted by u/lneves12
epistasis · a year ago
I think there's a fundamental mistunderstanding and mismatch between what you want to do, and what Jupyter notebooks are for. The distinction is between code versus the results.

If the code is the end product, sure, use a python package.

But does your .py with `# %%` in it also store the outputs? If not, why even bring this up? A .py output without the plots tied to the code doesn't meet the basic use case.

If the end product is the plot, I want to see how that plot was generated. And a Jupyter notebook is a much much better artifact than a Python package, unless that Python package hard codes the inputs and execution path like a notebook would.

Over the past 20 years of my career I have run into this divergence of use cases a lot. Software engineers seem to not understand the end goals, how it should be performed, and the learnings of the practitioners that have been generating results for a long time. It's hard to protect data scientists from these inflexible software engineers that see "aha that's code, I know this!" without bothering to understand the actual use case at hand.

spiralk · a year ago
Not having the outputs tied into the code is actually preferable if the ultimate goal is reproducible science. Code should be code, documentation should be documentation, and outputs should be outputs. Having multiple copies of important code in non-version controlled files is not a good practice. Having documentation dispersed with questionable organization in unsearchable files is not good a practice. Having outputs without run information and timestamps is not a good practice. Its easy to fall in to those traps with Jupyter notebooks. It might speed up initial set up and experimentation, but I've been working academic labs long enough to see the downstream effects.
spiralk commented on How we made Jupyter notebooks load faster   singlestore.com/blog/how-... · Posted by u/lneves12
ambicapter · a year ago
The form factor of Jupyter notebooks seems to fit well with peoples workflows though. Looks like you just wish the internals of Jupyter were better architected.
spiralk · a year ago
Imo, the better architected .ipynb is simply .py with '# %%' blocks. It does almost everything a .ipynb can do with the right VSCode extensions. Even interactive visualizations can be sent to a browser window or saved to disk with plotly. Though I do wish '# %%' cell based execution was accessible to more people.

There isn't a single install tool that "just works" for this at the moment. If editors came with more robust support for it by default, I think the notebook format wouldn't be needed at that point and people could use regular python and interactive cell based python more interchangeably. I've seen important code get buried under collections of jupyter notebooks across different users so I have a good reason for this. Notebooks simply dont scale beyond a certain complexity.

u/spiralk

KarmaCake day72May 11, 2024View Original