Using QEMU-user emulation to reverse engineer binaries

If you're trying this, it's important to note that '-d in_asm' does not "trace with disassembly for each executed instruction" as the article suggests. It traces disassembly when the insns are translated (JITted); once that has happened the insns can be executed multiple times and they won't reappear in in_asm logs. The 'time of execution' logging is '-d cpu' and '-d exec', which just log 'we executed this translation block' without any disassembly -- you then can correlate those logs with the previous 'in_asm' logging.

This is the most notable manifestation of the general point that QEMU's -d logging is primarily aimed at debugging QEMU itself -- it logs things that are easy to log and interpreting the output requires some understanding of QEMU's internals. The "I want to debug my guest without thinking about QEMU implementation details" interfaces are the gdbstub and more recently the TCG plugin API.

moyix · 5 years ago

Another helpful option if you're trying to get traces out is `-d nochain`, which turns off translation block chaining (chaining inserts a direct jump from one block to the next, which can cause logging statements to be skipped).

Also, if anyone is interested in using QEMU for whole system reverse engineering, allow me to shill PANDA, which adds a plugin API, record/replay, and a nice Python interface for all of this:

https://panda.re/

> The main use case for qemu-user is.. running programs for one CPU architecture on another.. most people don’t realize that you can run a qemu-user emulator which targets the same architecture as the host.

I believe there is one important caveat here: the OS has to stay the same.

https://superuser.com/questions/1355064/qemu-user-mode-emula...

AnIdiotOnTheNet · 5 years ago

True, but still neat. Frankly I think this kind of functionality should be built in to OSs. Why shouldn't I expect to be able to run code written for the same OS on two different architectures without recompilation?

remram · 5 years ago

The kernel supports that via the binfmt_misc feature. On Debian you just have to install the packages: https://wiki.debian.org/QemuUserEmulation

pm215 · 5 years ago

cbmuser · 5 years ago

> For those purposes, qemu-user works quite well: we are even considering using it to build the entire riscv64 architecture in the 3.15 release.

We‘ve been doing that in Debian on m68k and sh4 for quite a while now and it helped finding quite a lot of bugs, both in the target-specific emulation code and in the qemu-user code.

There is still one nasty bug in conjunction with glibc if anyone wants to help:

> https://sourceware.org/bugzilla/show_bug.cgi?id=23960

nenolod · 5 years ago

With musl we discovered that qemu does not always properly initialize structures when doing the syscall translation. Most likely this is a missing initialization.

saagarjha · 5 years ago

> For example, we can learn how a CPU would break a program down into translation buffers full of micro-ops

Note these are QEMU’s micro ops, not real CPU μops are is suggested here.

TCG's uops are modelled after real ones. It is close enough.

Close enough for…what possible purpose? If I was interested in μops I would want them to actually match the real ones…

akkartik · 5 years ago

OldGoodNewBad · 5 years ago

We sometimes use bochs for this.

tux3 · 5 years ago

Aah, the bochs builtin debugger. Fond memories...

lqqq · 5 years ago

I use qemu-user to run applications. For programming and RE, qiling is more suitable.

Deleted Comment