I was not expecting for Apple to be releasing any open source generative ai product. So the big players in the download it yourself gang are Meta & Apple? I don't think anyone would have taken that bet when OpenAI blew up.
Apple makes a ton of sense. They sell hardware (and services for that hardware). It would make sense that they want to commoditize foundation models while at the same time creating yet another reason to buy their most powerful hardware.
Meta is the real wildcard here. They're betting on "commoditize your compliment" and hoping by making GenAI training nearly free (by doing it for you) that none of their competitors can use GenAI models to gain an advantage over them.
Why is Meta the wildcard? Meta has the largest track record of open sourcing everything. From hardware (OpenCompute suite of releases ) to pretty much all software ( HHVM, Hack, Buck, etc etc ) that would benefit wider use.
Even Meta's ML research has a history of open sourcing everything, hello PyTorch. So Llama is just a continuation
Historically, Apple doesn't sell hardware, they sell a combination of software and hardware that are supposedly more tightly integrated than is possible when those things are developed separately. It's interesting and surprising that they would develop and release an open source model.
Inference should ideally be done on an instance with a GPU, and it should have enough GPU VRAM to hold the entire model. This one is only a 7b model, so it should run pretty easily in a modest amount of vram. You can likely run it locally in CPU-only mode, too, though it's likely to be rather slow.
You could try running these in Google Collab (though tweaks would have to be made to load the files), or you might try something like runpod.io, as well, which gets you GPU instances for a lot less than you'd pay with AWS - for example, a Tesla V100 (16GB) community cloud instance runs you $0.24/hr vs the g5g.xlarge which runs you $0.42/hr.
The code repo this article is based on - https://github.com/apple/ml-mgie - is based around InstructPix2Pix (which is based on Stable Diffusion) and LLaVA (multi-modal LLM that supports vision input).
It's research code, not a product. I don't think Apple would ever productionize anything with data lineage from Stable Diffusion.
This is going to be really cool. Being able to have pictures described, with AI, and then edit them using text, like underline this part of the screenshot, or something like that. Especially with Llava 1.6, and Apple's other model where it understands spatial parts of images, this should be pretty possible for me, as a blind person, to do. So yeah I think I'll wait to get this September's iPhone.
The significant part of this, for me, is not the image model. It's that Apple rarely publishes in the vision field. Back in Neurips (2016 I want to say?) Apple had a workshop where they said "we're going to engage more with the academic community, we're going to publish more," but since then only a small number of CV projects have seen the academic light of day. I remember one on detection of flashing lights, a whitepaper/tech report on Hey Siri wakeword detection, some interesting synthetic image generation for iris recognition, now this.
If I had to guess, I think Apple Research may have some institutional obstacles to getting research work out there. Perhaps this could indicate those walls are continuing to dissolve bit by bit?
They have at least two recent papers (MobileOne and FastViT) with code and weights for low-latency (as measured on iPhone) vision backbones. Also AIM (Autoregressive Image Models) released last month.
It feels like Apple has been very quiet about their approach in some of the future spaces (self-driving, VR/AR, AI) but then they suddenly burst out with something like Vision Pro.
It seems like they are biding their time and avoiding the hype knowing that they don't need to get their first. If they can produce a superior product their brand will let them dominate those markets whenever they decide to release.
> It seems like they are biding their time and avoiding the hype knowing that they don't need to get their first
This idea that Apple gets to things late to "perfect them" is an artifact of the iPod/iPhone era. Nothing Apple put out since then has met this bar.
Apple does things low-key these days because it has to. Remember the Apple Maps fiasco? A few memes was all it took to send everyone scurrying back to GMaps for a decade.
I’m not at all convinced about the inevitable success of the Vision Pro (which I maintain is in its current form, a dev kit being marketed as a mass-consumer device), but this just isn’t true.
To use a few examples that have happened after Apple Maps:
* The Apple Watch — Apple was at least 2 years behind other smartwatches when it launched in 2015 and now it basically owns the category. The first version wasn’t quite there but they nailed it with the series 2 and the pivot from fashion to fitness.
* AirPods - tons of companies had been trying truly wireless headphones for years. I know, I had tested basically every option commercially available. Although imperfect at launch (and the software got better over time for once here), AirPods from the moment I first used them in September of 2016 was category defining and the absolute best truly wireless earbuds you could get on the market, at any price point — even tho the quality was mid. Even now, you can do better than standard AirPods for the price/performance, but it’s hard to top AirPods Pro 2s unless you want to spend significantly more money and sacrifice ecosystem niceties (assuming you’re an iOS user. And the Venn diagram between people who are willing to spend over $250 on truly wireless earbuds and iPhone users is basically a circle).
What I do think is instructive about the Apple Maps example is that when it does come to services, and I put generative AI in the services camp much more than hardware, is that Apple is not best-in class at services at all. As a package, Apple services can be beneficial, especially if you’re part of the ecosystem - but even tho I pay Apple $38 a month for Apple One Premier, I’m still paying separately for Spotify, Dropbox, OneDrive (part of Office 365 but still), and Netflix (and a bunch of other video services) and I don’t think I could reasonably drop any of those services in-place of just using iCloud/Apple TV/Apple Music, etc. My parents can (except for Netflix/prime video), and Apple services are definitely “good enough” for a lot of people in the ecosystem, but none are best-in-class.
Unclear if they’d be able to be “good enough” to displace other generative AI services here or not, but to your point, Apple Maps still hasn’t gotten over its bad launch. And Siri on the phone is still dog shit. Siri is so bad that I won’t even use it on platforms where it is objectively good, like Apple TV, unless I specifically remember to.
iPad? Apple watch? Slim laptops (involving aircraft metallurgy tricks)?
Apple's been accused of copycatting practically since the beginning. But if you only copycat, you don't typically last this long. They can always offshore somewhere cheaper if you stand still long enough.
> If they can produce a superior product their brand will let them dominate those markets whenever they decide to release.
I anticipate that Apple's Pro devices (iPhone, MacBooks) will have beefier hardware designed specifically for this use case. My guess is they will sell a much higher proportion of Pro iPhones as a result. They'll probably also sell a lot more new devices overall, contra the trend toward slower upgrade cycles.
At the same time, I think third-party companies will find a way to do pretty decent on-device (and very good cloud-based) AI, so it will be a tradeoff in terms of speed and privacy. If you want the best speed and no data being shared with a cloud-based AI provider, the Pro devices will be aimed at you.
Apple has always been aware that "first mover advantage" is largely bullshit. "The pioneers get all the arrows." The sweet spot is to be second or third to market.
Someone just noticed Apple’s decades old SOP. Watch the aggregate fumble about, fail over and over, then take the few ideas that survive rapid, aimless iteration attempts and add the last mile of polish.
A decade of crappy Palm, WinCE, and easily forgotten Java based mobile devices came and went before iPhone.
They know what software people refuse to accept; it’s the hardware experience that matters and they’ll wait until the hardware is there.
Modern hardware is responsible for AI, more reliable networks, and the rest of modern compute. Has little to do with the bloated software stacks we git pull into the data center
Edit: Also consider that Intel and Apple don’t just make a consumer device. They design an advanced manufacturing pipeline. The difference between print(“hello world”) and for index in array: print(contents of index)
That’s the basis of their value and why they are propped up by government and why software focused startups will always just be pump and dump schemes.
Apple is the master of the second mover advantage. Let the existing companies burn money in R&D, but closely monitor how customers perceive their products. What pain points exist that the current companies haven't addressed? What are the pitfalls?
The first mover advantage gives you a chance to get most of the market share, but with Apple's walled garden, they don't need the first mover advantage to do that.
It's not the hardware, it's the combination of good hardware and software. That's paraphrasing Steve Jobs. He also stated that one of the primary advantages Apple had was software with better usability, generally higher quality software.
> Modern hardware is responsible for AI
No, again, it's the hardware and software. Tensorflow is what formally sparked the AI boom that we're still rolling on, and of course it needed the underlying hardware.
The hardware does nothing without the software. Python and Linux are good software, to name two.
Apple's real SOP is to make white males believe in the inevitability of Apple cultural dominance so they feel inclined to buy anything with the logo on it as a signal
i.e. you convinced yourself Vision Pro was great before ever using it
I presume they have an SE model in mind for their second iteration. Maybe there's a mothballed project we don't know about, or maybe someone at Apple is tired of rebranding product lines and insisted on making space for both.
Or maybe someone thinks they're managing expectations on price by putting Pro in the name, since it's on par with their other Pro products.
Speaking as a VR enthusiast who has virtually no interest in it: The Vision Pro is an entirely new product relative to any other VR headset I know of. It offers a comprehensive set of computing features (discounting an App Store which has only really just started) that puts it roughly on par with something like iPad OS. I would say it dominates currently for VR headsets if that is what you're after, because it's basically your only option. Desktop computing in VR is possible with things like a Vive or an Index but the experience is entirely different, worse on every front, and is basically done as a workaround in case you need to operate your PC while in VR without taking off the headset. And of course, all of those headsets are (quite intelligent, but remain merely) displays: they require a fairly powerful computer to attach to. There are headsets that operate independently too, like Meta's offerings, but those are much, much, much more restrictive in terms of capability than the Vision Pro.
My interest in VR is basically solely for gaming, so the Vision Pro for me is a complete miss and I have no plans to purchase one. That being said, in the realm of productivity VR, I'd say they are dominating in that they're the sole entrant to the market that I'm aware of and their offering is extremely compelling.
You can run your laptop and other apps with multiple screens as big and layed out where how you want while still having a usable (seeable) keyboard? Is the latency such that you don't get nauseous? Seems useful for productivity. Do other goggles have these feature, yes, but there's a difference between being a toy versus actually being as good/better than how I work now.
Meta is the real wildcard here. They're betting on "commoditize your compliment" and hoping by making GenAI training nearly free (by doing it for you) that none of their competitors can use GenAI models to gain an advantage over them.
Even Meta's ML research has a history of open sourcing everything, hello PyTorch. So Llama is just a continuation
Deleted Comment
https://instances.vantage.sh/?min_gpus=1 (wait for it to finish loading to filter)
You could try running these in Google Collab (though tweaks would have to be made to load the files), or you might try something like runpod.io, as well, which gets you GPU instances for a lot less than you'd pay with AWS - for example, a Tesla V100 (16GB) community cloud instance runs you $0.24/hr vs the g5g.xlarge which runs you $0.42/hr.
Deleted Comment
It's research code, not a product. I don't think Apple would ever productionize anything with data lineage from Stable Diffusion.
https://huggingface.co/spaces/tsujuifu/ml-mgie
(Long queue, I could not test it)
If I had to guess, I think Apple Research may have some institutional obstacles to getting research work out there. Perhaps this could indicate those walls are continuing to dissolve bit by bit?
It seems like they are biding their time and avoiding the hype knowing that they don't need to get their first. If they can produce a superior product their brand will let them dominate those markets whenever they decide to release.
This idea that Apple gets to things late to "perfect them" is an artifact of the iPod/iPhone era. Nothing Apple put out since then has met this bar.
Apple does things low-key these days because it has to. Remember the Apple Maps fiasco? A few memes was all it took to send everyone scurrying back to GMaps for a decade.
To use a few examples that have happened after Apple Maps:
* The Apple Watch — Apple was at least 2 years behind other smartwatches when it launched in 2015 and now it basically owns the category. The first version wasn’t quite there but they nailed it with the series 2 and the pivot from fashion to fitness.
* AirPods - tons of companies had been trying truly wireless headphones for years. I know, I had tested basically every option commercially available. Although imperfect at launch (and the software got better over time for once here), AirPods from the moment I first used them in September of 2016 was category defining and the absolute best truly wireless earbuds you could get on the market, at any price point — even tho the quality was mid. Even now, you can do better than standard AirPods for the price/performance, but it’s hard to top AirPods Pro 2s unless you want to spend significantly more money and sacrifice ecosystem niceties (assuming you’re an iOS user. And the Venn diagram between people who are willing to spend over $250 on truly wireless earbuds and iPhone users is basically a circle).
What I do think is instructive about the Apple Maps example is that when it does come to services, and I put generative AI in the services camp much more than hardware, is that Apple is not best-in class at services at all. As a package, Apple services can be beneficial, especially if you’re part of the ecosystem - but even tho I pay Apple $38 a month for Apple One Premier, I’m still paying separately for Spotify, Dropbox, OneDrive (part of Office 365 but still), and Netflix (and a bunch of other video services) and I don’t think I could reasonably drop any of those services in-place of just using iCloud/Apple TV/Apple Music, etc. My parents can (except for Netflix/prime video), and Apple services are definitely “good enough” for a lot of people in the ecosystem, but none are best-in-class.
Unclear if they’d be able to be “good enough” to displace other generative AI services here or not, but to your point, Apple Maps still hasn’t gotten over its bad launch. And Siri on the phone is still dog shit. Siri is so bad that I won’t even use it on platforms where it is objectively good, like Apple TV, unless I specifically remember to.
Apple's been accused of copycatting practically since the beginning. But if you only copycat, you don't typically last this long. They can always offshore somewhere cheaper if you stand still long enough.
Apple must have got to him...
I anticipate that Apple's Pro devices (iPhone, MacBooks) will have beefier hardware designed specifically for this use case. My guess is they will sell a much higher proportion of Pro iPhones as a result. They'll probably also sell a lot more new devices overall, contra the trend toward slower upgrade cycles.
At the same time, I think third-party companies will find a way to do pretty decent on-device (and very good cloud-based) AI, so it will be a tradeoff in terms of speed and privacy. If you want the best speed and no data being shared with a cloud-based AI provider, the Pro devices will be aimed at you.
Spend a few thousand dollars to get provisionals. Then if your idea works and grows, file the full patents.
Don't let the giants steal your blood, sweat, and tears finding the gradients.
A decade of crappy Palm, WinCE, and easily forgotten Java based mobile devices came and went before iPhone.
They know what software people refuse to accept; it’s the hardware experience that matters and they’ll wait until the hardware is there.
Modern hardware is responsible for AI, more reliable networks, and the rest of modern compute. Has little to do with the bloated software stacks we git pull into the data center
Edit: Also consider that Intel and Apple don’t just make a consumer device. They design an advanced manufacturing pipeline. The difference between print(“hello world”) and for index in array: print(contents of index)
That’s the basis of their value and why they are propped up by government and why software focused startups will always just be pump and dump schemes.
The first mover advantage gives you a chance to get most of the market share, but with Apple's walled garden, they don't need the first mover advantage to do that.
> Modern hardware is responsible for AI
No, again, it's the hardware and software. Tensorflow is what formally sparked the AI boom that we're still rolling on, and of course it needed the underlying hardware.
The hardware does nothing without the software. Python and Linux are good software, to name two.
Including the Apple Newton, arguably the first handheld device.
i.e. you convinced yourself Vision Pro was great before ever using it
I also wonder why they broke with their naming scheme ("Pro" is not the Pro of any smaller model)
Or maybe someone thinks they're managing expectations on price by putting Pro in the name, since it's on par with their other Pro products.
https://apple.fandom.com/wiki/AppleVision_Display
My interest in VR is basically solely for gaming, so the Vision Pro for me is a complete miss and I have no plans to purchase one. That being said, in the realm of productivity VR, I'd say they are dominating in that they're the sole entrant to the market that I'm aware of and their offering is extremely compelling.