Transformers.js - Readit News

I really liked the suggestion that if it takes off, the web should consider trying to expose something like the OpenXLA intermediate model, which powers the new PyTorch 2.0, TensorFlow, Jax, and a bunch of other top tier ML frameworks.

It already is very well optimized for a ton of hardware (cpus, gpus, ml chips). The Intermediate Representation might already be a web-safe-ish model, effectively self-sandboxing, which could make it safe to expose.

https://news.ycombinator.com/item?id=35078410

nl · 3 years ago

Shouldn't it be possible to build a WebGL backend for OpenXLA?

Edit: There seems to be some progress on a WASM backend for OpenXLA here: https://github.com/openxla/iree/issues/8327

and a proposed WebML working group at W3C: https://www.w3.org/2023/03/proposed-webmachinelearning-chart... that references OpenXLA

rektide · 3 years ago

Making each webapp target & optimize ML for every possible device target sounds terrible.

The purpose of MLIR is that most of the optimization can be done at lower levels. Instead of everyone figuring out & deciding on their own how best to target & optimize for js, wasm, webgl, and/or webgpu, you just use the industry standard intermediate representation & let the browser figure out the tradeoffs. If there is inboard hardware, neural cores, they might just work!

Good to see WebML has OpenXLA on their radar... but also a bit afraid, expecting some half ass excuses why of course we're going to make some brand new other thing instead. The web & almost everyone else has such a bad NIH problem. WASI & web file apis being totally different is one example, where there's just no common cause, even though it'd make all the difference. And with ML, the cost of having your own tech versus being able to re-use the work everyone else puts on feels like a near suicidal decision to make an API that will never be good, never perform anywhere where near it could.

brrrrrm · 3 years ago

I don't think a high level representation is necessary for relatively straightforward FMA extensions (either outer products in the case of Apple AMX or matrix products in the case of CUDA/Intel AMX). WebGPU + tensor core support and WASM + AMX support would be simpler to implement, likely more future proof and wouldn't require maintaining a massive layer of abstraction.

cromwellian · 3 years ago

The issue is, much of the performance of Pytorch, JAX, et al comes from running a JIT that is tuned to the underlying HW, and come with support for high level intrinsic operations that were either hand-tuned or have extra hardware support, especially ops dealing with parallelizing computation across multiple cores.

You'd probably end up representing these as external library function calls in WASM, but then the WASM JIT would have to be taught that these are magic functions that are potentially treated specially, so at that point you're just embedding HLO ops as library func, and them embedding an HLO translator into the WASM runtime, I'm not sure that's any better.

By analogy would be be better to eliminate fragment and vertex shaders and just use WASM for sending shaders to the GPU, or is the domain specific language and its constraints beneficial to the GPU drivers?

crowwork · 3 years ago

checkout https://mlc.ai/web-stable-diffusion, which is builds on top of Apache TVM and brings in models from PyTorch2.0, ONNX and other means into the ML compilation flow

That's pretty neat. I'm personally wondering in how far ML compute will be done on consumer devices, rather than on servers. We're currently seeing a lot of models that are so large that it doesn't seem feasible to run them locally. But I think there is reason to believe that these models carry a lot of redundancy. Redundancy that could lead to order of magnitude less memory/compute needed.

Or perhaps hardware will catch up before.

nwoli · 3 years ago

The trick here will be using large models as data generators to distill some sub task into a web computable model. (I’ve done it a few times for vision rather than text and it’s amazing how potent it is.)

ShamelessC · 3 years ago

Right! In a lot of cases, just having the synthetic responses plus human filtering for your sub task is enough for less essential tasks. I’m thinking of “procedural” content useful for less sensitive things like games.

thebruce87m · 3 years ago

Can you describe the vision bit? I have a general idea but would like to know the details, e.g. which models you used.

simonw · 3 years ago

It's possible to run a full GPT-3 style language model on any device with 4GB of RAM now, so running models on consumer devices is getting more and more feasible by the day. https://simonwillison.net/2023/Mar/11/llama/

whimsicalism · 3 years ago

Its possible to run a RLHF tuned Llama 7b model. Whether this is "full GPT-3 style" is up for debate.

illiarian · 3 years ago

> I'm personally wondering in how far ML compute will be done on consumer devices, rather than on servers.

Running ML on the device has been one of Apple's value propositions for a long time. They are currently silent on everything that's unfolding, but I expect them to at least mention something and WWDC (and trying to run that something on the device)

bredren · 3 years ago

If I understand correctly, there was an all-company invited annual AI day which was silent on recent developments.

But then ~two weeks later there was what seemed like an on-background / press leak about the XDG group that specifically mentioned AI as a current discipline. (Gurman / Bloomberg)

It seems to me that the release of Core ML stable diffusion (mentions itt) is something if a comment in of itself. At least in the read between the lines / hiding in plain sight style of Apple.

The company is unveiling a new and presumably next major computing platform at a quality level only they could possibly deliver.

So the relative quiet / lack of comment may be in deference to the gravity of that work.

That said, these changes are too big to ignore—-we should at least hear language that acknowledges the major developments in AI of late at WWDC and some idea for how Apple is thinking about them.

refulgentis · 3 years ago

They’re there, released Core ML Stable Diffusion a couple months ago.

Ameo · 3 years ago

> Or perhaps hardware will catch up before.

I feel like that's been the pretty consistent lesson in computing over the past decades. New technologies start out as expensive, exotic, and specialized and become cheap and commonplace over time. The more business value the technology provides, the faster it will happen as well I think.

The models will certainly get better (faster to train, less data needed, smaller param counts, etc.) too, though, just like compilers and software have evolved hugely alongside hardware.

yieldcrv · 3 years ago

they'll meet in the middle. that's what's already happening, and there will probably be co-processors added into consumer devices that excel specifically at the kind of processing that these models need.

dragonwriter · 3 years ago

> there will probably be co-processors added into consumer devices that excel specifically at the kind of processing that these models need.

There already are, e.g., Google Edge TPU, Apple Neural Engine, etc.

xenova · 3 years ago

Hi everyone! Creator of Transformers.js here :) ...

Thanks so much to everyone for sharing! It's awesome to see the positive feedback from the community. As you'll see from the demo, everything runs inside the browser!

As of 2023/03/16, the library supports BERT, ALBERT, DistilBERT, T5, T5v1.1, FLAN-T5, GPT2, BART, CodeGen, Whisper, CLIP, Vision Transformer, and VisionEncoderDecoder models, for a variety of tasks including: masked language modelling, text classification, text-to-text generation, translation, summarization, question answering, text generation, automatic speech recognition, image classification, zero-shot image classification, and image-to-text. Of course, we plan to add many more models and tasks in the near future!

Try out some of the other models/tasks from the "Task" dropdown (like the code-completion or speech-to-text demos).

---

To respond to some comments about poor translation/generation quality, many of the models are actually quite old (e.g., T5 is from 2020)... and if you run the same prompt through the PyTorch version of the model, you will get similar outputs. The purpose of the library/project is to bring these models to the browser; we didn't train the models, so, poor quality can (mostly) be blamed on the original model.

Also, be sure to play around with the generation parameters... as with many LLMs, generation parameters matter a lot.

If you want to keep up-to-date with the development, check us out on twitter: https://twitter.com/xenovacom :)

gl-prod · 3 years ago

Can I use it in Deno? It requires a worker (fails in node because "self")

Yes, there are some workarounds you can do to get it working in non-browser environments. I do aim to get a permanent solution, which will ideally work out-of-the-box for both browser and node/deno environments.

Some other users also reported the issue (which stems from a bug in onnxruntime-web), and we were able to get it working in these cases:

1. https://github.com/xenova/transformers.js/issues/4 2. https://github.com/xenova/transformers.js/issues/19

rkagerer · 3 years ago

Is there an Optimus model yet for Prime number encoding?

rc202402 · 3 years ago

Good one

penny10k · 3 years ago

What did Optimus Prime say when he first learned about machine learning? "Autobots, roll out the algorithms!"

itsaquicknote · 3 years ago

Hah, ChatGPT has successfully poisoned the well. Well done sama.

This lib is great work, a JS interface for running HF models. The comments about how "bad" the outputs are as surprising to me as they are alarming.

OAI has now set the zero-effort bar so high that even HNers (who click on .js headlines) fall into the gap they've left. That sucking sound you hear is market share being hoovered up.

buryat · 3 years ago

your comments are very snarky

It would be great if we all try to keep the tone respectful and avoid snarkiness to maintain a constructive discussion

https://news.ycombinator.com/newsguidelines.html

No they're not mate, it's just you. I've read the guidelines (thanks for helpfully linking them). I see this on HN, people infer offense and cite the book rather than engage.

By not highlighting what you found "snarky" your response is a definitional "shallow dismissal". I see you just "picked the most provocative thing to complain about". Not a lot of being "kind" either.

So you know what would also be great? If you held yourself to the standards you're keen to police around here.

anonu · 3 years ago

I typed in 1 2 3 4 5 6 in a text generation task with length=500 and got this:

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 4142 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 1 2 3 4 5 6 7 8 9 10 11 12 13 15 15 16 16 18 19 20 21 22 23 24 25 25 26 27 28 29 30 31 32 32 33 34 35 36 37 38 39 41 42 44 45 46 47 48 50 51 53 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 85 86 87 88 89 90 92 93 94 95 97 98 99 100

This is the third time that a candidate has been elected. In this article I will use the names of the candidates and the candidates. In 2016 the following is a list of the current and former U.S. presidential candidates: Former Bush Former Bush Former Bush Former Bush Former Bush Former Bush Former Bush Former Bush Former Bush Former Bush Former Bush Former Bush/Bush/Bush/Bush (with Republican presidential candidates) Former Bush Former Bush Former Bush Former Bush Former Bush Former Bush Former Bush Former Bush/Bush/Bush/Bush/Bush Former Bush Former Bush Former Bush Former Bush Former Bush Former Bush Former Bush Former Bush/Bush/Bush/Bush/Bush/Bush/Bush/Bush/Bush/Bush/Bush/Bush/Bush/Bush/Bush/Bush/Bush/Bush/Bush/Bush/Bush/Bush/Bush/Bush/Bush/Bush/Bush/Bush/Bush/Bush/Bush/Bush/Bush/Bush/Bush/Bush/Bush/Bush/Bush/Bush/Bush/Bush/Bush/Bush/Bush/Bush/Bush/Bush/Bush/Bush/Bush/Bush/Bush/Bush/Bush/Bush/Bush/B-1919191929

drcongo · 3 years ago

Gonna use this as lyrics, thanks.

sva_ · 3 years ago

justinator · 3 years ago

Hmm, this works with literal translation, then?

    Hello, how are you?

is literally,

    Bonjour, comment êtes-vous?

But usually you would say,

    Bonjour, comment ça-va?

(Hello, how goes it?)

Which the model likes to translate to,

    Bonjour, comment est-ce faite?

Which no french person would ever say to you because that's a lot of words and doesn't really sound very... French.

And of course are you talking to someone familiar... so on and so forth.

Hi! Creator of the library here. If you change the generation parameters to be greedy (i.e., sample=no and top_k=0), you will get "Bonjour, comment êtes-vous?"

The top_k and sample generation parameters are just there to show that they are supported :), and is sometimes useful for the other tasks (like text generation w/gpt2, to get more variety)

scambier · 3 years ago

I understand there's reasons the translation is incorrect, but if the very first example you're showing on the page is wrong, most people (who are fluent enough) will just roll their eyes and leave it at that. Maybe showcase an example that works?

Deleted Comment

t00ny · 3 years ago

I did a couple of tries with simple sentences in French and the results were not great. But it’s still impressive.

Edd314159 · 3 years ago

I uploaded the Windows XP desktop wallpaper into the image classifier. Just the raw image file. It gave me the labels "monitor", "computer screen", "desktop". "Field", "sky", grass", that kind of thing were nowhere to be found.

I know this is more of a comment on the state of AI models than Transformers.js. It's probably not even representative of state-of-the-art image classifier models. Just a fun example of how these things learn.

Haha very interesting! I assume it's because that type of image is only found on computer screens, so, the model thinks the grass "contributes to it's idea of what a computer screen is".

... and of course, the library only ports those models to the browser; if you train a better model, you can always convert it to the ONNX format, then use it with the library.

iLoveOncall · 3 years ago

Even the default example of "Hello, how are you?" from English to French yields an awfully wrong result ("Hello, what is your experience?")...

I wouldn't trust them for anything else.

The other models are not better, here's the text generation output from "I enjoy walking my cute dog":

> I enjoy walking with my cute dog, I have been going to the park, and I just happened to like walking with my cute dog. I like to play with the dog. My dog (Hannah) has been on my way home since December and when she came home she told me to go out and stay back. I told her that she had been too busy. I had to start working and had to go outside and go see myself again.

It could be just an algorithm that generates random sentences that it wouldn't make less sense.

Hi there! Creator of Transformers.js here :)

I think it's worth pointing out that the library just gets the models working in the browser. The correctness of the translation is dependent on the model itself.

If you run the model using HuggingFace's python library, you will also get the same results (I've tested it, since, I wasn't too happy with those default translations and generations).

With regards to the text generation output, this is also similar to what you will get from the PyTorch model. Check out this blog post from HuggingFace themselves which discusses this: https://huggingface.co/blog/how-to-generate.

> Even the default example of "Hello, how are you?" from English to French yields an awfully wrong result ("Hello, what is your experience?")...

Really? For me that gives "Bonjour, comment êtes-vous?" with the default settings.

> text generation output

Yeah, text generation is really something that requires a big model. The Llama 7B param model quantized to 4bit is 13G and that is the smallest model I'd actually attempt to use for unconstrained text generation.

wdaher · 3 years ago

> "Bonjour, comment êtes-vous?"

The idiomatic translation here would be "Bonjour, comment allez-vous?"

Fiaxhs · 3 years ago

« Bonjour, comment êtes-vous? » barely translates to « Hi, how are you feeling today? » or, depending on the context, to something like « Hi, please describe yourself » to a native French speaker.