What’s in that .wasm? Introducing wasm-decompile

This looks much nicer than the wasm2c output for that binary. I compiled it with `clang wasm.c -c -target wasm32 -O2` just like in the instructions (I'm on LLVM 10), and used the latest wasm2wat with `wasm2wat -f wasm.o` and got this instead:

  (module
    (type (;0;) (func (param i32 i32) (result f32)))
    (import "env" "__linear_memory" (memory (;0;) 0))
    (import "env" "__indirect_function_table" (table (;0;) 0 funcref))
    (func (;0;) (type 0) (param i32 i32) (result f32)
      (f32.add
        (f32.add
          (f32.mul
            (f32.load
              (local.get 0))
            (f32.load
              (local.get 1)))
          (f32.mul
            (f32.load offset=4
              (local.get 0))
            (f32.load offset=4
              (local.get 1))))
        (f32.mul
          (f32.load offset=8
            (local.get 0))
          (f32.load offset=8
            (local.get 1))))))

wasm2c (also from WABT) returns this thing: https://paste.linux.community/view/7877995f

Aardappel · 5 years ago

wasm2c has a different objective though: to be recompile-able again while preserving semantics. wasm-decompile was designed for readability first.

snazz · 5 years ago

Fair enough. I’m still surprised at just how unreadable (for me) the wasm2c output was, though. The compiler must have done quite a bit of optimizing that wasm2c was unable to undo.

I'm the author, if anyone has specific questions :)

6nf · 5 years ago

I notice that your code supports the 'name' custom section as expected, and furthermore you support a few other custom sections too - 'dylink' for example. Where did you find the documentation for these sections? The reason I ask is that I don't believe the official webassembly specs talk about those sections, so I guess they are somewhat compiler specific perhaps?

Aardappel · 5 years ago

They are indeed not part of the spec since they are somewhat tool specific, for example the linking symbol names so far are only consumed by LLD. Docs here: https://github.com/WebAssembly/tool-conventions/blob/master/...

ellis0n · 5 years ago

I'm new in wasm code base. Where is export code generation located? How complex to rewrite export? I want to make export wasm to .acpul programming language for run wasm modules on animation cpu platform. Link to architecture schemas & docs will be a great.

klodolph · 5 years ago

This is fascinating. For various reasons, WASM is less like a target bytecode format and more like a peculiar IR for compilers. I’m sure this has all sorts of effects on the tooling.

k__ · 5 years ago

What's the difference?

If you were designing a bytecode as a compilation target, you would provide an easy correspondence in the bytecode to basic blocks.

See: https://en.wikipedia.org/wiki/Basic_block

WASM instead provides traditional control structures. So the compiler either has to preserve control structures through to the IR, or has to work backwards from basic blocks to control structures. Both options are undesirable, from the perspective of compiler writers, and would be unnecessary if the VM were a greenfield project.

saagarjha · 5 years ago

Most bytecode isn’t optimized much.

mmastrac · 5 years ago

This is super handy. Pseudocode is very useful for understanding flow - so much more than actual assembly. I've always found it an order of magnitude to understand bad asm-to-C decompilation from IDA or Ghidra over perfect disassembly.

dlojudice · 5 years ago

> Decompile to what?

> `wasm-decompile` produces output that tries to look like a "very average programming language" while still staying close to the Wasm it represents.

> #1 goal is readability

> #2 goal is to still represent Wasm as 1:1 as possible

It seems AssemblyScript would do the job

[1] https://assemblyscript.org/

AssemblyScript would certainly do worse at #2, and possibly also at #1. To be translate to Wasm or from Wasm lead to different optimal designs, see for example how these two systems deal with loads and stores.

3pt14159 · 5 years ago

It would be nice if the decompiled output were runnable through an interpreter so you could step through it with a debugger of some kind and rename or annotate the variables and functions as you reverse engineer what is going on.

hardwaregeek · 5 years ago

Loving the tooling around wasm getting better. I've been debugging my compiler output with hexl-mode and reading the binary format and while it's not that bad, it'd be nice to do more advanced debugging with a text format.

There was a project I saw too that intended to visualize WebAssembly's execution. That'd be extremely helpful too

cfallin · 5 years ago

> reading the binary format ... it'd be nice to do more advanced debugging with a text format.

Do you know about `wasm2wat` (from the WebAssembly binary toolkit, "WABT")? It produces a 1-to-1 text representation of the bytecode and is meant to always roundtrip via `wat2wasm` back to the same bytecode.

Yeah...I should probably use that. But does it work on mangled WASM? Part of the issue was that my compiler wasn't producing valid WASM

irrational · 5 years ago

When I first started learning JavaScript in the late 90s, the primary way I learned new things was from reading other peoples code in my browser. Nowadays this isn't as easy since you often have to run obfuscated code through a prettifier to get it back into a human readable format, but it is still possible with some effort. I was concerned that WASM would make this impossible (despite the stated goal of "Be readable and debuggable — WebAssembly is a low-level assembly language, but it does have a human-readable text format (the specification for which is still being finalized) that allows code to be written, viewed, and debugged by hand."), but WASM-decompile gives me hope.
https://developer.mozilla.org/en-US/docs/WebAssembly/Concept...

Deleted Comment