So this is great but it might be worth pointing out that the library went from Canvas2D (slow) and ThreeJS (very general purpose) to pure WebGL calls tailored to the application, which alone probably would have been the significant driver behind the performance improvements. It’s hard to see exactly how much WASM has helped on top of that, and I wonder if a pure JS+WebGL version would perform about as well at the fraction of file size, complexity and parse speed.
Usually I wouldn’t recommend going WASM unless you have very hot CPU code paths, like physics across thousands of 3D bodies and collisions that is eating up your frame time. In the case of an image viewer I’m not sure that code is ‘hot’ enough in CPU terms to really warrant all this.
(I’d love to be proved wrong with benchmarks and find more general uses of wasm tho !)
Just my 2c. Really great write up and work nonetheless!
Thanks for the comment! I agree that the change to a custom tailored engine probably made the biggest difference performance wise. At the end of the article I also briefly mention this.
However, having a monolithic compiled Wasm module which contains all of (and only) the rendering logic is really nice on a codebase-level.
Also, the Wasm part of Micrio is being used for a lot of realtime matrix4 calculations. While I didn't use this specific library, here is a bench page which shows that Wasm really kills JS performance here: https://maierfelix.github.io/glmw/mat4/ .
So it definitely makes a difference. If I had the time I would port the Wasm engine back to JS and do more benchmarking. But that's a big if :)
I can see how WASM definitely improves performance in those benchmarks (where we're talking about thousands of matrix operations). But I imagine your per-frame camera & matrix math is probably not taking up much time (within a 16ms budget), so a lot of these WASM optimizations may not have any real influence on the rendering performance. But, they could introduce more upfront parsing/loading time for your library (i.e. base64 decoding, WASM instantiation), and obviously a bring with them a lot more complexity (in authoring the module and supporting legacy devices).
Anyways, it's pretty cool that you can call GL functions directly from WASM, I hadn't realized that before and it probably would make for an epic game engine to have almost everything called from WASM land.
Regarding the "emscripten-imposed file size problem", I've written a couple of alternative cross-platform headers which might help in this case (e.g. avoiding SDL). These enable WASM apps in the "dozens of KBytes" range, and you get native-platform support "for free":
The binaryen toolkit comes with a wasm2js tool, you could translate the wasm back to js and see how performance compares ;)
It's possible that performance isn't all that different, because asm.js-style Javascript can be surprisingly fast (compared to "idiomatic" human-written Javascript).
Otherwise it's a completely pointless excercise of course, unless you need to support browsers without WASM support (which don't exist anymore AFAIK).
Just a PSA, if you including a .wasm binary in your app, especially if it is large, be sure to request it separately as a .wasm (with MIME type "application/wasm", as your browser will cache the compiled code along with the original bytes in its HTTP cache, so you'll get lightning fast startup.
It would be nice to know how much WebAssembly actually optimized here vs just writing WebGL in JavaScript. There are too many changes here to know if the speed ups and size gains are related to wasm or just to dumping three.js, and switching all the previous canvas 2D rendering to webgl rendering.
In fact (maybe I missed it), if they're using AssemblyScript is it possible to just run it through TypeScript and check?
Someone else in this post asked the same question and I answered briefly there. Indeed the engine rewrite itself from general purpose to specific usage probably made the biggest difference performance-wise.
However, I do use matrix4 calculations for the 360 viewer, for which Wasm definitely is better.
Interesting idea to do a followup benchmark using the AssemblyScript as TS! I might actually have to do that :D
I apologise for making a meta point, but this is a fantastic example of how a story can take a while to "click" on HN. This is its sixth submission in 4 months (none by me, before you ask) and the first to actually get anywhere :-) There is so much gold flowing through HN /newest and I strongly encourage more people to check it out if possible.
Wow this is so awesome. I would be highly interested how the energy consumption is dropped by using web assembly. If you consider that 5 billion people use smartphones and browse the web, if you could reach just 1% less cpu time on average that would have a significant impact on worldwide energy consumption...
Really interesting question. Whereas I don't have power-specific benchmarks, the general rule of thumb for power consumption is indeed the amount of CPU + GPU usage. It's an eternal battle to stay feature rich and keep smooth texture downloads, but use as little power as possible. And ofcourse don't draw any unnecessary frames.
What I managed to do with the entire Wasm rewrite operation, is definitely get the number of CPU cycles down by a lot, and shortening the pathways between raw CPU and GPU operations.
Since this article I was already able to do a lot more optimizations, because of the resulting new code architecture being so much clearer than before. Funny how more minimal structures allow for better optimizing.
Seconded: I had a blast reading this article. And I'd also be very curious for stats about power profile instead of raw speed (I'd expect them to be more or less correlated, but that might be a flawed assumption)
This is a very nice and thorough write-up. Frankly, I would've dropped IE support altogether, that browser has been dead a long long time. No need to futz with gzip and base64, just fetch the wasm, engines optimize for the common path.
I think the bigger sandbag is the single file requirement. I understand why for this type of project and the way it is often deployed by its users that seems like a necessity, but at the same point, we're about to the point that you can assume that ES2015+ browsers also support HTTP/2+, individual file requests are no longer quite the same bottleneck they have been in HTTP/1.x.
In which case you could get away with a "loader" possibly as simple as one <script type="module" src="path/to/modern-lib"></script> and one <script>/* older browser fallback */</script>.
Yeah, I still encounter IE10 and older sometimes -- looking at the wide audiences Micrio projects are used for, this ranges from grandparent's Windows XP machines to OS-locked government IE10-PCs.
Can't be helped, so an automatic fallback to the previous Micrio version is the least I can do.
I do really hope that there could be a _final_ update from MS to IE10 to at least support ES6. That would also make such a big difference.
It's not dead, it's alive and well on millions of machines still perfectly functional and usable.
There are many developers, and thus services, who are not too lazy to support it. I've been able to achieve compatibility down to IE3, and hoping to go down all the way to 1.0, as a lone developer writing a relatively complex project.
Retro-computing is growing at remarkable rates. But aside from that, there are also people out there using older devices not for the retro-computing cool, but because that's what they have.
Telling them to upgrade is like telling a wheelchair user that they need to upgrade to legs. After all, wheelchair users are probably less than 1% of the population, right?
If I were you, as a developer, I would be just a little bit ashamed and embarrassed of the cop-out attitude displayed in your comment.
> I've been able to achieve compatibility down to IE3, and hoping to go down all the way to 1.0, as a lone developer writing a relatively complex project.
You are perfectly free to spend you time as you see fit; however, you might notice that many projects are dropping support of old browsers. This frees up developer resources, helps with the writing of cleaner code — e.g. CSS grid as opposed to tables or floats; js modules rather than huge js files or complex bundlers; web components instead of imperative handling of all the update logic — and opens up new possibilities in the browser including the webassembly. This is clearly a win for developers; but it also ends up being a win for users.
Comparing IE usage to wheelchair users is disingenuous and gross. No one is physically unable to move on from Internet explorer 10. Times change, and standards do too.
It'd be interesting to get a true measure of the performance opportunities from WASM.
Every app has performance opportunities, but they usually not quite related to the raw performance of the language, usually, it's loading and backend latency.
People don't realize that V8 is really, really fast and always getting faster, and that WASM was never really super fast.
As V8 gets faster the delta opportunity for WASM is reduced at least in terms of raw, algorithmic performance.
I worked on both V8, Wasm, and Wasm in V8. The two have separate goals. V8's priority for JavaScript is to run the trillions of lines of JavaScript code found in the wild well and to support the language's evolution. It is no longer primarily about top-level throughput performance on computational kernels. Wasm has the goal of high, predictable performance for numeric-heavy, legacy-, and low-level code. Wasm is also focused on bringing more languages to the web that could reasonably be accomodated by a compile-to-JavaScript approach.
V8 is not the only implementation of WASM. Safari and Mozilla also provide implementations. Though Apple is probably dragging their feet here, Mozilla has been quite active with WASM development, supporting it in their browser, making sure Rust (which they created) can use it, and doing lots of developer support and outreach for WASM.
The word legacy is a bit biased here. From where I'm sitting, Javascript as implemented by browsers is increasingly the legacy language. Even most javascript webapps are actually transpiled to it. It's a compilation target more often than something people write natively. And as such it is used because until WASM came along, it was the only viable way to run anything on the web.
WASM provides developers a better compilation target that can ultimately support just about anything and already is used to run a wide variety of languages that have the ability to target it (in various stages of completion and maturity). Most mainstream languages at this point basically.
And of course WASM is not limited to the web. It's also showing up in edge computing solutions pushed by e.g. Digital Ocean, Cloudflare, etc. Many node.js applications can leverage WASM as well. It's probably used in a fair amount of desktop applications using electron, like VS Code for example. People are even experimenting with embedding wasm in operating system kernels and firmware even. It turns out that performance and sandboxing are very desirable properties in lots of places.
So, it's a general purpose runtime intended to sandbox and efficiently run just about any code. Including javascript interpreters ultimately. My prediction is that the latter will happen before the end of this decade and that Javascript will lose its status as a special language that is exclusively supported in browsers in addition to WASM. It will allow browser creators to focus on optimizing and securing WASM instead of WASM + Javascript.
"predictable performance for numeric-heavy, legacy"
You mean all that 'legacy' C++ meant for the web?
Yes, I see what someone might mean, i.e Autocad able to re-use a lot of code, but those are small use cases.
And as for 'predictability' ... is it really predictable in application, given the amount of quasi-compilation that has to happen to get to WASM, and, more important, does it matter if it's predictable if it's slow, or rather, V8 is 'fast enough' in comparison?
The fact is, WASM is a Zombie Technology. It's odd that it continues to exist, like Bitcoin it has a few vested interests, and the 'dream' seems real, but there are just very few material uses cases, and the practical reality of doing anything functional with WASM is very small.
It's an intellectual concept, created without any specific customers/users in mind, and as such has very little adoption.
Because V8 is 'so good' and 'fit to purpose' and increasingly so (whatever stated objectives are), it means the real opportunity for WASM just fades.
Porting old C++ is a narrow case, and writing new C++ with a janky toolchain, and very limited ways to interact with networking, and UI ... all for what reason again?
V8 JS is equally fast or even faster for the same code after 3-4 runs. Cold paths, however, are usually 2 orders of magnitude slower than what you can achieve with WASM, for the same code. So it kind of depends on what you're building. Do you need a <5ms response at all times or can you live with an ~150ms response sometimes? 150ms randomly blocking code can be unacceptable for some cases and WASM doesn't suffer from that.
That said, you can't simply swap certain functions for WASM. There is an overhead for calling WASM functions, though very tiny - so if you're really going for performance then your app should be entirely living in WASM and interfacing directly with native browser APIs.
So that's a good point - however, in some recent benchmarks I've run V8 vs. JS code in GraalVM JS module, it seems that V8 does move to that optimal state fairly quickly, and 3-4 runs for a specific line of code is usually a small price to pay, given the alternative would be to use an entirely different stack, which even a decade in, barely integrates with the host environment - and - one must account for the inevitable 'bridging' costs between WASM and JS domains.
For example, some kind of 'pre optimized' 'JS engine byte code' that a regular JS engine could load as a module - which could be used by any other, regular JS code, would be considerably more optimal in terms of adapting to real world needs.
Oddly, that could probably be done right now if the world just happened to agree on a JS implementation like V8, I'm not suggesting we all do that, but at least we should be aware of the price we are paying.
I give WASM a 50/50 chance of becoming relevant, some of the new APIs may make a big difference, it remains to be seen if they do.
I'm curious to know more about their statement that loading textures in a WebWorker caused additional performance overhead versus loading them in the main thread. I have my own WebWorker based texture loading system and it is significantly better than not using it.
At the risk of stating the obvious, as OP seems pretty well on top of the Web API game, did they remember to mark the texture data ArrayBuffer or ImageBitmap as transferable?
Though, I suppose my metric is "number of dropped frames", not "total time to load" and I haven't actually measured the latter. My project is a WebXR project, so the user is wearing a VR headset, wherin dropped frames are the devil.
I have to admit that I didn't dive 100% into what made this difference exactly. It just worked better without webworkers.
I just dived into the git history, and it turns out I didn't use the buffers as transferable. Perhaps that was it! I'll definitely check that out later, thank you for pointing this out!
Usually I wouldn’t recommend going WASM unless you have very hot CPU code paths, like physics across thousands of 3D bodies and collisions that is eating up your frame time. In the case of an image viewer I’m not sure that code is ‘hot’ enough in CPU terms to really warrant all this.
(I’d love to be proved wrong with benchmarks and find more general uses of wasm tho !)
Just my 2c. Really great write up and work nonetheless!
However, having a monolithic compiled Wasm module which contains all of (and only) the rendering logic is really nice on a codebase-level.
Also, the Wasm part of Micrio is being used for a lot of realtime matrix4 calculations. While I didn't use this specific library, here is a bench page which shows that Wasm really kills JS performance here: https://maierfelix.github.io/glmw/mat4/ .
So it definitely makes a difference. If I had the time I would port the Wasm engine back to JS and do more benchmarking. But that's a big if :)
I can see how WASM definitely improves performance in those benchmarks (where we're talking about thousands of matrix operations). But I imagine your per-frame camera & matrix math is probably not taking up much time (within a 16ms budget), so a lot of these WASM optimizations may not have any real influence on the rendering performance. But, they could introduce more upfront parsing/loading time for your library (i.e. base64 decoding, WASM instantiation), and obviously a bring with them a lot more complexity (in authoring the module and supporting legacy devices).
Anyways, it's pretty cool that you can call GL functions directly from WASM, I hadn't realized that before and it probably would make for an epic game engine to have almost everything called from WASM land.
https://github.com/floooh/sokol
Check out these samples and have a look at the size in the browser devtools which are IMHO quite sensible:
https://floooh.github.io/sokol-html5/
https://floooh.github.io/tiny8bit/
PS: really great writeup btw :)
It's possible that performance isn't all that different, because asm.js-style Javascript can be surprisingly fast (compared to "idiomatic" human-written Javascript).
Otherwise it's a completely pointless excercise of course, unless you need to support browsers without WASM support (which don't exist anymore AFAIK).
[1] https://github.com/WebAssembly/binaryen
Just a PSA, if you including a .wasm binary in your app, especially if it is large, be sure to request it separately as a .wasm (with MIME type "application/wasm", as your browser will cache the compiled code along with the original bytes in its HTTP cache, so you'll get lightning fast startup.
It would be nice to know how much WebAssembly actually optimized here vs just writing WebGL in JavaScript. There are too many changes here to know if the speed ups and size gains are related to wasm or just to dumping three.js, and switching all the previous canvas 2D rendering to webgl rendering.
In fact (maybe I missed it), if they're using AssemblyScript is it possible to just run it through TypeScript and check?
Someone else in this post asked the same question and I answered briefly there. Indeed the engine rewrite itself from general purpose to specific usage probably made the biggest difference performance-wise.
However, I do use matrix4 calculations for the 360 viewer, for which Wasm definitely is better.
Interesting idea to do a followup benchmark using the AssemblyScript as TS! I might actually have to do that :D
Really interesting question. Whereas I don't have power-specific benchmarks, the general rule of thumb for power consumption is indeed the amount of CPU + GPU usage. It's an eternal battle to stay feature rich and keep smooth texture downloads, but use as little power as possible. And ofcourse don't draw any unnecessary frames.
What I managed to do with the entire Wasm rewrite operation, is definitely get the number of CPU cycles down by a lot, and shortening the pathways between raw CPU and GPU operations.
Since this article I was already able to do a lot more optimizations, because of the resulting new code architecture being so much clearer than before. Funny how more minimal structures allow for better optimizing.
https://techcommunity.microsoft.com/t5/microsoft-365-blog/mi...
In which case you could get away with a "loader" possibly as simple as one <script type="module" src="path/to/modern-lib"></script> and one <script>/* older browser fallback */</script>.
Though I'm surprised the author continued to support IE 10. IE 10 and below are quite dead.
https://www.w3counter.com/trends
https://analytics.usa.gov/
https://analytics.usa.gov/data/live/ie.json
Edit: Amazing to see there's still IE5 usage on US Gov analytics :)
Can't be helped, so an automatic fallback to the previous Micrio version is the least I can do.
I do really hope that there could be a _final_ update from MS to IE10 to at least support ES6. That would also make such a big difference.
One can wish..
Chrome 69.28% Edge 7.75% Firefox 7.48% Internet Explorer 5.21% Safari 3.73% QQ 1.96% Sogou Explorer 1.73% Opera 1.12% Yandex 0.90% UC Browser 0.37%
I feel sorry for the people who continue working on IE because they think it's just as important as Edge/Safari/Firefox. :(
There are many developers, and thus services, who are not too lazy to support it. I've been able to achieve compatibility down to IE3, and hoping to go down all the way to 1.0, as a lone developer writing a relatively complex project.
Retro-computing is growing at remarkable rates. But aside from that, there are also people out there using older devices not for the retro-computing cool, but because that's what they have.
Telling them to upgrade is like telling a wheelchair user that they need to upgrade to legs. After all, wheelchair users are probably less than 1% of the population, right?
If I were you, as a developer, I would be just a little bit ashamed and embarrassed of the cop-out attitude displayed in your comment.
You are perfectly free to spend you time as you see fit; however, you might notice that many projects are dropping support of old browsers. This frees up developer resources, helps with the writing of cleaner code — e.g. CSS grid as opposed to tables or floats; js modules rather than huge js files or complex bundlers; web components instead of imperative handling of all the update logic — and opens up new possibilities in the browser including the webassembly. This is clearly a win for developers; but it also ends up being a win for users.
Every app has performance opportunities, but they usually not quite related to the raw performance of the language, usually, it's loading and backend latency.
People don't realize that V8 is really, really fast and always getting faster, and that WASM was never really super fast.
As V8 gets faster the delta opportunity for WASM is reduced at least in terms of raw, algorithmic performance.
The word legacy is a bit biased here. From where I'm sitting, Javascript as implemented by browsers is increasingly the legacy language. Even most javascript webapps are actually transpiled to it. It's a compilation target more often than something people write natively. And as such it is used because until WASM came along, it was the only viable way to run anything on the web.
WASM provides developers a better compilation target that can ultimately support just about anything and already is used to run a wide variety of languages that have the ability to target it (in various stages of completion and maturity). Most mainstream languages at this point basically.
And of course WASM is not limited to the web. It's also showing up in edge computing solutions pushed by e.g. Digital Ocean, Cloudflare, etc. Many node.js applications can leverage WASM as well. It's probably used in a fair amount of desktop applications using electron, like VS Code for example. People are even experimenting with embedding wasm in operating system kernels and firmware even. It turns out that performance and sandboxing are very desirable properties in lots of places.
So, it's a general purpose runtime intended to sandbox and efficiently run just about any code. Including javascript interpreters ultimately. My prediction is that the latter will happen before the end of this decade and that Javascript will lose its status as a special language that is exclusively supported in browsers in addition to WASM. It will allow browser creators to focus on optimizing and securing WASM instead of WASM + Javascript.
You mean all that 'legacy' C++ meant for the web?
Yes, I see what someone might mean, i.e Autocad able to re-use a lot of code, but those are small use cases.
And as for 'predictability' ... is it really predictable in application, given the amount of quasi-compilation that has to happen to get to WASM, and, more important, does it matter if it's predictable if it's slow, or rather, V8 is 'fast enough' in comparison?
The fact is, WASM is a Zombie Technology. It's odd that it continues to exist, like Bitcoin it has a few vested interests, and the 'dream' seems real, but there are just very few material uses cases, and the practical reality of doing anything functional with WASM is very small.
It's an intellectual concept, created without any specific customers/users in mind, and as such has very little adoption.
Because V8 is 'so good' and 'fit to purpose' and increasingly so (whatever stated objectives are), it means the real opportunity for WASM just fades.
Porting old C++ is a narrow case, and writing new C++ with a janky toolchain, and very limited ways to interact with networking, and UI ... all for what reason again?
That said, you can't simply swap certain functions for WASM. There is an overhead for calling WASM functions, though very tiny - so if you're really going for performance then your app should be entirely living in WASM and interfacing directly with native browser APIs.
For example, some kind of 'pre optimized' 'JS engine byte code' that a regular JS engine could load as a module - which could be used by any other, regular JS code, would be considerably more optimal in terms of adapting to real world needs.
Oddly, that could probably be done right now if the world just happened to agree on a JS implementation like V8, I'm not suggesting we all do that, but at least we should be aware of the price we are paying.
I give WASM a 50/50 chance of becoming relevant, some of the new APIs may make a big difference, it remains to be seen if they do.
At the risk of stating the obvious, as OP seems pretty well on top of the Web API game, did they remember to mark the texture data ArrayBuffer or ImageBitmap as transferable?
Though, I suppose my metric is "number of dropped frames", not "total time to load" and I haven't actually measured the latter. My project is a WebXR project, so the user is wearing a VR headset, wherin dropped frames are the devil.
I just dived into the git history, and it turns out I didn't use the buffers as transferable. Perhaps that was it! I'll definitely check that out later, thank you for pointing this out!
Over multiple benchmark runs this results in another 32% CPU performance gain (WebWorkers vs single thread texture downloads)!
Thank you for point this out to me!
Not using transferable has HUGE performance hits, it can take even seconds to transfer larger buffers.