jedbrown (u/jedbrown)

jedbrown commented on AI tooling must be disclosed for contributions github.com/ghostty-org/gh... · Posted by u/freetonik

jedbrown · 6 months ago

Provenance matters. An LLM cannot certify a Developer Certificate of Origin (https://en.wikipedia.org/wiki/Developer_Certificate_of_Origi...) and a developer of integrity cannot certify the DCO for code emitted by an LLM, certainly not an LLM trained on code of unknown provenance. It is well-known that LLMs sometimes produce verbatim or near-verbatim copies of their training data, most of which cannot be used without attribution (and may have more onerous license requirements). It is also well-known that they don't "understand" semantics: they never make changes for the right reason.

We don't yet know how courts will rule on cases like Does v Github (https://githubcopilotlitigation.com/case-updates.html). LLM-based systems are not even capable of practicing clean-room design (https://en.wikipedia.org/wiki/Clean_room_design). For a maintainer to accept code generated by an LLM is to put the entire community at risk, as well as to endorse a power structure that mocks consent.

jedbrown commented on Avoiding AI is hard – but our freedom to opt out must be protected theconversation.com/avoid... · Posted by u/gnabgib

tkellogg · 9 months ago

Somewhere in 2024 I noticed that "AI" shifted to no longer include "machine learning" and is now closer to "GenAI" but still bigger than that. It was never a strict definition, and was always shifting, but it made a big shift last year to no longer include classical ML. Even fairly technical people recognize the shift.

jedbrown · 9 months ago

The colloquial definitions have always been more cultural than technical, but it's become more acute recently.

> I think we should shed the idea that AI is a technological artifact with political features and recognize it as a political artifact through and through. AI is an ideological project to shift authority and autonomy away from individuals, towards centralized structures of power. https://ali-alkhatib.com/blog/defining-ai

jedbrown commented on TeX and Typst: Layout Models (2024) laurmaedje.github.io/post... · Posted by u/fngjdflmdflg

szvsw · a year ago

> I think one of the best ways to overcome the enormous momentum of TeX is to point out its limitations (while still keeping an eye on Typst's limitations), and explain how Typst overcomes them.

One of the other easy ways to overcome it is to provide as many templates as possible for journals. I’ve used LaTeX for years, but would by no means consider myself an expert in LaTeX, as I’ve almost exclusively been able to grab a template from a journal or from my university, and then just draft in the relevant blocks, write equations, add figures, and, rarely, add a package. I would guess that there are a huge amount of LaTeX users like me out there. I do all my drafting on Overleaf. I love TeX (and curse my PI whenever he requires that we use Word/365 instead of LaTeX/Overleaf)… but so much of the benefit, for me at least, comes from the fact that templates are readily available for any journal I would want to submit to; my masters thesis was built in a template provided by my university; etc. I don’t have to deal with any of the cognitive overhead of styling and formatting (except for flowing the occasional figure) and can just focus on drafting.

For me to even consider typst, it’s pretty much a requirement that there is some degree of template parity actively being worked on. The most natural way to approach that would be to just sort every journal by impact factor and start working top to bottom; given that so many journals share templates due to being within elsevier, springer etc, it should be straightforward to reach a reasonable degree of parity relatively quickly.

Getting the major publishers to support and offer their Typst templates would make me try it out immediately for what it’s worth.

jedbrown · a year ago

Many journals require LaTeX due to their post-acceptance pipeline. I use Typst for letters and those docs for which my PDF is the final version (modulo incomplete PDF/A in Typst), but for many journals in my field, I'd need a way to "compile to LaTeX" or the journal would need to implement a post-acceptance workflow for Typst (I'm not aware of any that have).

jedbrown commented on Tim Cook is 'not 100 percent' sure Apple can stop AI hallucinations theverge.com/2024/6/11/24... · Posted by u/cdme

wumbo · 2 years ago

generating information without regard to the truth is bullshitting, not necessarily malicious intent.

for example, this is bullshit because it’s words with no real thought behind it: “if being correct was important, the developers would have taken an entirely different approach”

jedbrown · 2 years ago

If you are asking a professional high-stakes questions about their expertise in a work context and they are just bullshitting you, it's fair to impugn their motives. Similarly if someone is using their considerable talent to place bullshit artists in positions of liability-free high-stakes decisions.

Your second comment is more flippant than mine, as even AI boosters like Chollet and LeCun have come around to LLMs being tangential to delivering on their dreams, and that's before engaging with formal methods, V&V, and other approaches used in systems that actually value reliability.

jedbrown commented on Tim Cook is 'not 100 percent' sure Apple can stop AI hallucinations theverge.com/2024/6/11/24... · Posted by u/cdme

ein0p · 2 years ago

Fabrication implies malicious intent or at least intentional deception. LLMs don’t have any “intent”.

jedbrown · 2 years ago

Their developers have intent. That intent is to give the perception of understanding/facts/logic without designing representations of such a thing, and with full knowledge that as a result, it will be routinely wrong in ways that would convey malicious intent if a human did it. I would say they are trained to deceive because if being correct was important, the developers would have taken an entirely different approach.

jedbrown commented on An interview with AMD CEO Lisa Su about solving hard problems stratechery.com/2024/an-i... · Posted by u/wallflower

cepth · 2 years ago

No, not really?

The customers at the national labs are not going to be sharing custom HPC code with AMD engineers, if for no other reason than security clearances. Nuclear stockpile modeling code, or materials science simulations are not being shared with some SWE at AMD. AMD is not “removing jank”, for these customers. It’s that these customers don’t need a modern DL stack.

Let’s not pretend like CUDA works/has always worked out of the box. There’s forced obsolescence (“CUDA compute capability”). CUDA didn’t even have backwards compatibility for minor releases (.1,.2, etc.) until version 11.0. The distinction between CUDA, CUDA toolkit, CUDNN, and the actual driver is still inscrutable to many new devs (see the common questions asked on r/localLlama and r/StableDiffusion).

Directionally, AMD is trending away from your mainframe analogy.

The first consumer cards got official ROCm support in 5.0. And you have been able to run real DL workloads on budget laptop cards since 5.4 (I’ve done so personally). Developer support is improving (arguably too slowly), but it’s improving. Hugging Face, Cohere, MLIR, Lamini, PyTorch, TensorFlow, DataBricks, etc all now have first party support for ROCm.

jedbrown · 2 years ago

> customers at the national labs are not going to be sharing custom HPC code with AMD engineers

There are several co-design projects in which AMD engineers are interacting on a weekly basis with developers of these lab-developed codes as well as those developing successors to the current production codes. I was part of one of those projects for 6 years, and it was very fruitful.

> I suspect a substantial portion of their datacenter revenue still comes from traditional HPC customers, who have no need for the ROCm stack.

HIP/ROCm is the prevailing interface for programming AMD GPUs, analogous to CUDA for NVIDIA GPUs. Some projects access it through higher level libraries (e.g., Kokkos and Raja are popular at labs). OpenMP target offload is less widespread, and there are some research-grade approaches, but the vast majority of DOE software for Frontier and El Capitan relies on the ROCm stack. Yes, we have groaned at some choices, but it has been improving, and I would say the experience on MI-250X machines (Frontier, Crusher, Tioga) is now similar to large A100 machines (Perlmutter, Polaris). Intel (Aurora) remains a rougher experience.

jedbrown commented on Legal models hallucinate in 1 out of 6 (or more) benchmarking queries hai.stanford.edu/news/ai-... · Posted by u/rfw300

mycall · 2 years ago

Indeed. Humans hallucinate under these definitions.

jedbrown · 2 years ago

The point is that LLMs are never right for the right reason. Humans who understand the subject matter can make mistakes, but they are mistakes of a different nature. The issue reminds me of this from Terry Tao (LLMs being not-even pre-rigorous, but adept at forging the style of rigorous exposition):

It is perhaps worth noting that mathematicians at all three of the above stages of mathematical development can still make formal mistakes in their mathematical writing. However, the nature of these mistakes tends to be rather different, depending on what stage one is at:

1. Mathematicians at the pre-rigorous stage of development often make formal errors because they are unable to understand how the rigorous mathematical formalism actually works, and are instead applying formal rules or heuristics blindly. It can often be quite difficult for such mathematicians to appreciate and correct these errors even when those errors are explicitly pointed out to them.

2. Mathematicians at the rigorous stage of development can still make formal errors because they have not yet perfected their formal understanding, or are unable to perform enough “sanity checks” against intuition or other rules of thumb to catch, say, a sign error, or a failure to correctly verify a crucial hypothesis in a tool. However, such errors can usually be detected (and often repaired) once they are pointed out to them.

3. Mathematicians at the post-rigorous stage of development are not infallible, and are still capable of making formal errors in their writing. But this is often because they no longer need the formalism in order to perform high-level mathematical reasoning, and are actually proceeding largely through intuition, which is then translated (possibly incorrectly) into formal mathematical language.

The distinction between the three types of errors can lead to the phenomenon (which can often be quite puzzling to readers at earlier stages of mathematical development) of a mathematical argument by a post-rigorous mathematician which locally contains a number of typos and other formal errors, but is globally quite sound, with the local errors propagating for a while before being cancelled out by other local errors. (In contrast, when unchecked by a solid intuition, once an error is introduced in an argument by a pre-rigorous or rigorous mathematician, it is possible for the error to propagate out of control until one is left with complete nonsense at the end of the argument.)

https://terrytao.wordpress.com/career-advice/theres-more-to-...

jedbrown commented on New Bounds for Matrix Multiplication: From Alpha to Omega epubs.siam.org/doi/10.113... · Posted by u/beefman

jcranmer · 2 years ago

Strassen's algorithm is rarely used: its primary use, to my understanding, is in matrix algorithms of more exotic fields than reals or complex numbers, where minimizing multiplies is extremely useful. Even then, Strassen only starts to beat out naive matrix multiply when you're looking at n >= 1000's--and at that size, you're probably starting to think about using sparse matrices where your implementation strategy is completely different. But for a regular dgemm or sgemm, it's not commonly used in BLAS implementations.

The more advanced algorithms than Strassen's are even worse in terms of the cutover point, and are never seriously considered.

jedbrown · 2 years ago

The cross-over can be around 500 (https://doi.org/10.1109/SC.2016.58) for 2-level Strassen. It's not used by regular BLAS because it is less numerically stable (a concern that becomes more severe for the fancier fast MM algorithms). Whether or not the matrix can be compressed (as sparse, fast transforms, or data-sparse such as the various hierarchical low-rank representations) is more a statement about the problem domain, though it's true a sizable portion of applications that produce large matrices are producing matrices that are amenable to data-sparse representations.

jedbrown commented on Where is Noether's principle in machine learning? cgad.ski/blog/where-is-no... · Posted by u/cgadski

nostrademons · 2 years ago

In physics, the conserved quantity isn't always time. Invariance over time translation is specifically conservation of energy. Invariance over spatial translation is conservation of momentum, invariance over spatial rotation is conservation of conservation of angular momentum, invariance of electromagnetic field is conservation of current, and invariance of wave function phase is conservation of charge.

I think the analogue in machine learning is conservation over changes in the training data. After all, the point of machine learning is to find general models that describe the training data given, and minimize the loss function. Assuming that a useful model can be trained, the whole point is that it generalizes to new, unseen instances with minimal losses, i.e. the model remains invariant under shifts in the instances seen.

The more interesting part to me is what this says about philosophy of physics. Noether's Theorem can be restated as "The laws of physics are invariant under X transformation", where X is the gauge symmetry associated with the conservation law. But maybe this is simply a consequence of how we do physics. After all, the point of science is to produce generalized laws from empirical observations. It's trivially easy to find a real-world situation where conservation of energy does not hold (any system with friction, which is basically all of them), but the math gets very messy if you try to actually model the real data, so we rely on approximations that are close enough most of the time. And if many people take empirical measurements at many different points in space, and time, and orientations, you get generalized laws that hold regardless of where/when/who takes the measurement.

Machine learning could be viewed as doing science on empirically measurable social quantities. It won't always be accurate, as individual machine-learning fails show. But it's accurate enough that it can provide useful models for civilization-scale quantities.

jedbrown · 2 years ago

> It's trivially easy to find a real-world situation where conservation of energy does not hold (any system with friction, which is basically all of them)

Conservation of energy absolutely still holds, but entropy is not conserved so the process is irreversible. If your model doesn't include heat, then discrete energy won't be conserved in a process that produces heat, but that's your modeling choice, not a statement about physics. It is common to model such processes using a dissipation potential.

jedbrown commented on The KDE desktop gets an overhaul with Plasma 6 lwn.net/SubscriberLink/96... · Posted by u/jrepinc

jedbrown · 2 years ago

What is the state of tiling in Plasma 6 and how does it compare to Pop Shell (https://github.com/pop-os/shell) for GNOME?