But the migration to ARM is proving to be quite a pain point. Not being able to just do things as I would on x86-64 is damaging my productivity and creating a necessity for horrible workarounds.
As far as I know none of our pipelines yet do multi-arch Docker builds, so everything we have is heavily x86-64 oriented. VirtualBox is out of the picture because it doesn't support ARM. That means other tools that rely on it are also out of the picture, like Molecule. My colleague wrote a sort of wrapper script that uses Multipass instead but Multipass can't do x86-on-ARM emulation.
I've been using Lima to create virtual machines which works quite well because it can do multiple architectures. I haven't tested it on Linux though, and since it claims to be geared towards macOS that worries me. We are a company using a mix of MacBooks and Linux machines so we need a tool that will work for everyone.
The virtualisation situation on MacBooks in general isn't great. I think Apple introduced Virtualization.framework to try and improve things but the performance is actually worse than QEMU. You can try enabling it in the Docker Desktop experimental options and you'll notice it gets more sluggish. Then there's just other annoyances, like having to run a VM in the background for Docker all the time because 'real' Docker is not possible on macOS. Sometimes I'll have three or more VMs going and everything except my browser is paying that virtualisation penalty.
Ugh. Again, I love the performance and battery life, but the fragmentation this has created is a nightmare.
How is your experience so far? Any tips/tricks?
In reality I have hardly turned on the Intel MBP at all since I got it. At all.
Docker and VMware Fusion both have Apple Silicon support, and even in "tech preview" status they are both rock solid. Docker gets kudos for supporting emulated x86 containers, though I rarely use them.
I was able to easily rebuild almost all of my virtual machines; thanks to the Raspberry Pi, almost all of the packages I use were already available for arm64, though Ubuntu 16.04 was a little challenging to get running.
I also had to spend an afternoon updating my CI scripts to cross-compile my Docker containers, but this mostly involves switching to `docker buildx build`.
Rosetta is flawless, including for userland drivers for USB and Bluetooth devices, but virtually all of my apps were rebuilt native very quickly. (Curious to see what, if anything, is running under translation, I just discovered that WhatsApp, the #1 Social Networking app in the App Store, still ships Intel-only.)
Often times people experience different levels of difficulty using Apple Silicon precisely because my workload is not yours, and yours is different again from OP's
So I feel this particular Ask HN is more about wondering how different everyone's workflows are, and how that impacts M1 usage.
I envision that workflow options/pathways will start converging into one "way" which is the Apple way. You already are getting shunted into relying purely on Metal for gpu acceleration and you see the various plurality of gpgpu libraries start converging on only the Apple blessed/authorized and optimized version.
There are people fighting against this, for example the Linux on Apple Silicon project bringing up the GPU, but it's slow going.
Give it another few years, and people will stop using x, y, or z frameworks, and only use whatever API's Apple gives us, because that is the Apple Way.
Proceed at your own peril. The future is fast, but there is only one road.
https://developer.apple.com/documentation/apple-silicon/abou...
There are a handful of apps that aren't supported, but few of these are popular apps. Virtualbox is notable, but unsurprising: Rosetta is not designed for x86_64 virtual machines, and Virtualbox doesn't support arm64. (I submitted a correction to the Wine entry, since wine64 has worked under Rosetta 2 for a year.)
https://isapplesiliconready.com/for/unsupportedhttps://liliputing.com/2021/06/wine-6-0-1-lets-you-run-windo...
On my m1 I see 16x performance differences in builds in favour of native over emulated. Even simple shell script run slow or seem to stall when emulated.
Deleted Comment
E.g. if I wanted to start building ARM binaries on a x86 host, is that the sort of thing this would enable?
docker buildx build --platform linux/amd64,linux/arm64 .
I constantly use buildx to build x86 images on my M1 so that I can run these images in my x86 Kubernetes cluster.
https://docs.docker.com/buildx/working-with-buildx/
I found this article with some overview and example command lines: https://www.docker.com/blog/multi-arch-images/ . As best I can tell, you don't actually need a custom builder. You can just skip the `docker buildx create` and go straight to `docker buildx build` after one workaround below.
I needed the workaround from this comment to make it work (to install qemu? not sure): https://github.com/docker/buildx/issues/495#issuecomment-754...
Overall a very slick experience for going from zero to multiarch containers. Well done, Docker.
Deleted Comment
There's a useful app called Silicon Info on Github (https://github.com/billycastelli/Silicon-Info) and also on the Mac App Store.
It adds a menu bar icon that switches according to the currently-focused app's architecture.
We had to put so much effort to just run things on Rosetta because all of our compiled code had AVX enabled. We also needed to chase down pre-compiled binaries and re-compile them without AVX, we still haven't finished this work.
https://developer.apple.com/documentation/virtualization/run...
https://developers.apple.com/videos/play/wwdc2022/10002/
If you are not familiar, Rosetta is how Apple Silicon Macs run existing Mac x86 binaries and it is highly performant. It does binary pre-compilation and cacheing. It also works with JIT systems. They are now making that available within Linux VMs running on Macs.
Last thing I read, 70% of the native performance was shown by running GeekBench through Rosetta (with a few odd results noted).
If somebody has better info...
Edit: I see that Nov 2020 checks returned an 80% performance, and there was discussion on HN at (at least) https://news.ycombinator.com/item?id=25105597
ARM Geekbench single core on M1 MacOS is 1734. ARM Geekbench single core on WinARM in VM on M1 is 1550. x86 single core on i9 MB Pro MacOS is 1138. x86 in emulation on M1 MacOS is 1254.
Yes, 72% x86 Rosetta vs. M1 Native. However, x86 Rosetta on M1 was faster than the previous i9 2019 Macbook Pro x86 native. I consider that to be performant for running code that was compiled for a very different architecture.
Benchmarks are good for bragging rights and maybe convincing over-zealous accounting to approve a purchase (but even then that’s probably not all there is to it.)
Anyway I can say that my colleagues M1 using rosetta is faster or equal to my MBP i9 2020.
When benchmarking x86 and ARM containers, our application seems to be around ~5x slower with x86-rosetta, and similarly can be observed for mysql-server or just doing `apt install`.
This is still significantly better than using qemu emulation, but it's not really usable in our case.
I've also encountered segmentation faults when running x86 `npm` inside Docker, so couldn't even install packages, but didn't dig further as to what's the cause.
(Note: I've created a simple macOS app using Virtualization framework, enabled Rosetta, and loaded Ubuntu Focal. I've installed the latest version of Docker, which automatically used `rosetta` when encountering x86 executables. Maybe this setup is not ideal.)
Much more impressively it also leverages a custom hardware x86-like memory model unique to the M1/Apple ARM chips. That's where most of the performance really comes from, as I understand it.
Hopefully, this is the right timestamp:
https://developer.apple.com/videos/play/wwdc2022/10002/?time...
My solution was to give up using my M1 mac for development work. It sits on a desk as my email and music machine, and I moved all my dev work to an x86 Linux laptop. I'll probably drift back to my mac if the tools I need start to properly support Apple Silicon without hacky workarounds, but until GitHub actions supports it and people start doing official releases through that mechanism, I'm kinda stuck.
It is interesting how much impact GitHub has had by not having Apple Silicon support. Just look at the ticket for this issue to see the surprisingly long list of projects that are affected. (See: https://github.com/actions/virtual-environments/issues/2187)
That wouldn’t be too much of an issue if you could just cross compile like you can with go. However graalvm can’t do this yet.
In nutshell I don't see how having Apple Silicon locally makes the problem - if your non local env (dev, prod, stage) is running on x86 Linux or even arm Linux, shouldn't be any issue to build for that architectures on your build farms anyway.
I may be missing some important part here.
Putting on my tin-foil hat for a sec: GitHub is owned by Microsoft, who would really stand to benefit from slowing down Apple Silicon adoption a bit...
As a primarily Linux user these feel like very familiar stories.
It's kinda refreshing to hear those stories from mac users. Maybe we are not so different after all.
After all, macOS is certified Unix and Linux is Unix-like.
A working GUI, to be exact. Source: switched to Macs from Linux in 2013.
They all just use VS Code or Jetbrains and that's about it, so hell if I know why they need a $3000 machine to run shell commands they don't understand on.
I desperately wish one of the big boys would push enterprise Linux dev machines hard.
2. Everything you install will be arm based. Docker will pull arm-based images locally. Most every project (that we use) now has arm support via docker manifests.
3. Use binfmt to cross-compile x86 images within prlctl, or have CI auto-build images on x86 machines.
That pretty much does it.
We were using UTM but have recently switched to Parallels, which is nice.
Our prod stayed on x86 but we’ve started moving to graviton3 which is better bang for buck. Suspect it’ll end up being a common story for others too.
m1s are just such nice machines that I’d go quite out of my way to stay on them now.
https://news.ycombinator.com/item?id=31094361&p=2#31101729
https://news.ycombinator.com/item?id=31094361#31098721
https://news.ycombinator.com/item?id=27825240#27825420
This makes so much sense now for many workflows. I no longer complain about my computers being slow so I don't even think of upgrading, and if something's annoying it's mostly about software rather than hardware anyway, so no point in upgrading, although the M1 seems to have convinced a lot of people otherwise. Looking forward to adopt this new tech... in 3-5 years or so.
This also makes it cheaper to upgrade through second-hand device, just stay one or two models behind.
I'm only half joking. I'm of the group of people who know that Docker is a security nightmare unless you're generating your Docker images yourself, so wherever I've had to support that, I insist on that. If you don't use software that's either processor centric (and therefore buggy, IMHO) or binary-only, then this is straightforward and a win for everyone.
Run x86 and amd64 VMs on real x86 and amd64 servers, and access them remotely, like we've done since the beginning of time (teletypes predate stored program electronic computers).
Since Docker is x86 / amd64 centric, treat it like the snowflake it is, and run it on x86 / amd64.
I work on scientific software, so the biggest technical issue I face day-to-day is that OpenMP based threading seems almost fundamentally incompatible with M1.
https://developer.apple.com/forums/thread/674456
The summary of the issue is that OpenMP threaded code typically assumes that a) processors are symmetric and b) there isn't a penalty for threads yielding.
On M1 / macOS, what happens is that during the first OpenMP for loop, the performance cores finish much faster, their threads yield, and then they are forever scheduled on the efficiency cores which is doubly bad since they're not as fast and now have too many threads trying to run on them. As far as I can tell (from the linked thread and similar) there is not an API for pinning threads to a certain core type.
GOMP_CPU_AFFINITY=“1 2 5 6”
With thread 1 bound to core 1, thread 2 on core 2, thread 3 on core 5, thread 4 on core 6. I don’t have an M1 to play around on but I’d have assumed that the cores are fixed IDs.
Aside from that, if the workload is predictable in time, using a more complex scheduling pattern might help. You could perhaps look at how METIS partitions the workload, but see if it’s modifiable by adding weights to the cores reflective of their relative performance. Generally, to get good OMP performance I always found it better to treat it almost like it’s not shared memory, because on HPC clusters, you have NUMA anyway which drags performance down once you have more threads than a single processor has cores in the machine
I agree with your other points though!
Deleted Comment