Readit News logoReadit News
psifertex · 3 years ago
Sorry for the outages, friends. We're actively working on getting it able to handle higher load but we knew that if we hit HN we'd be swamped no matter what we did. We're spinning up more workers and fixing obvious perf issues as we see them, but if it's not available when you try it make sure to check back later!
athrowaway3z · 3 years ago
I really appreciate these kinds of websites.

But i wonder if we eventually go full circle and it becomes easier and cheaper to send a wasm linux kernel with virtual disk access over websockets instead of processing stuff server side.

rfoo · 3 years ago
IDA Pro and Binary Ninja listed here is proprietary and require expensive licenses to run.

Dead Comment

meibo · 3 years ago
I feel like the decompiler space is a little stuck? I mostly go with Hex-Rays out of habit and because I'm used to IDA, but I haven't really seen x64 decompiler output noticeably improve in recent releases.

A lot of my colleagues use Ghidra a lot now and complain about its decompiler regularly.

Is there any new approach in the works? Maybe something ML-based for optimization? Would be sad if Hex-Rays output is "as good as it's gonna get".

tralarpa · 3 years ago
> A lot of my colleagues use Ghidra a lot now and complain about its decompiler regularly.

Are your colleagues decompiling obfuscated code (for example malware)? Publicly available decompilers are not working well for that, but I assume that many specalists have their own little improvements and plugins that they don't share with others because it's their core business.

For non-obfuscated code, Ghidra has served me very well, even for entire applications. Often, it has to be pushed into the right direction (for example, by manually specifying the type of a variable) and it sometimes misses some obvious simplifications especially when arrays are involved, but I think those issues could be solved relatively easily by polishing/extending its heuristics. Nothing where I would say that ML is needed, although it would be possible. At the end, most programs contain the same patterns and an ML-based system could help identifying them.

But yeah, obfuscated code, that's something else. There are some academic publications about the usage of ML for that. No idea what's happening inside the company labs, though.

badsectoracula · 3 years ago
I haven't used Ghidra "seriously" but i fed it some non-trivial programs i wrote in Free Pascal and i was very surprised to see that it recreated a C++ program that was incredibly similar to what the Free Pascal program looked like.

Of course it wasn't obfuscated and there were a couple of mistakes here and there but overall it'd work perfectly fine for someone to understand what the program was doing if they didn't had access to the source code.

aardshark · 3 years ago
From my small experience of Ghidra, it didn't do great once the code was not using standard calling conventions (i.e it was probably compiled with optimization flags )

Sometimes it would just straight up ignore (functional) assembly for apparently no reason. Or it would turn simple code into a myriad of nested conditionals and loops, achieving the same goal, but looking nothing like a human would write.

It was still very helpful in understanding blocks of assembly much faster than I otherwise would, and it's possible I was lacking some configuration that a more experienced user could do to help the decompiler out.

ishitatsuyuki · 3 years ago
Rellic [1] implements an algorithm that generates goto-free control flows (citation in README), which would be a significant improvement against what Ghidra/IDA generates currently.

Unfortunately it looks like the maintenance state of the pieces around Rellic isn't very good, and it's quite rocket science to get it building. It doesn't have as much UI/GUI as Ghidra either so it's a bit far from accessible right now.

[1]: https://github.com/lifting-bits/rellic

dataflow · 3 years ago
> that generates goto-free control flows

...note: from LLVM bitcode.

FrozenVoid · 3 years ago
What happens with code that uses lots of gotos(incl. computed gotos)?
zozbot234 · 3 years ago
> Rellic [1] implements an algorithm that generates goto-free control flows

Doesn't WebAssembly implement that already, via Relooper?

hoosieree · 3 years ago
> Is there any new approach in the works? Maybe something ML-based for optimization?

I'm doing a PhD on this.

My goal is to detect known functions from obfuscated binaries.

The biggest challenge by far is building a good dataset. Unlike computer vision (millions of pictures with the label "dog") the number of training examples for a typical function is one. For now I'm focusing on C standard libraries, since there are a handful of real-world implementations plus some FOSS or students samples available for things like strlen and atoi.

If anyone wants to collaborate, feel free to message me.

anitil · 3 years ago
I'm not sure I follow - wouldn't many statically linked programs have much of some version of libc within them? So you could take any program, change it to be statically linked and use that for training?

That said I assume I'm missing something here.

Akronymus · 3 years ago
Could a best guess + fuzzing + compiling the decompiled code work towarda a heuristic?
develatio · 3 years ago
I hear good things about Binary Ninja!
baby · 3 years ago
I always found it odd that ida pro was such a pile of poop when it probably made sooo much money
ohnoesjmr · 3 years ago
Decompiler space probably has a few tens of millions in revenue yearly, yet writing a good decompiler is quite a lot of engineering effort, and you are not going to spend tons of money and effort to capture a measly 10m market, you'll rather be the next uber type thing that targets a much bigger market.

Hence HexRay can get away with not doing much and just collecting license fees from existing customers yearly, as there isn't a better alternative anyway.

pjc50 · 3 years ago
One of the major categories of users is people in the warez scene, all of whom are pirating it. The only other one is security researchers, which is a pretty small market.
lalopalota · 3 years ago
IDA Pro was an amazing disassembler and accompanying set of tools - top of the pack for quite a while.
ykl · 3 years ago
Love the joke in the URL. :)

(For anyone that doesn't get it; it's a play on Godbolt)

jraph · 3 years ago
What is really funny about this is that Godbolt is the last name of the Compiler Explorer's author. But it seems like it is a brand, a word now.

Being able to swap two letters from a name and get something nice like this is lucky.

Godbolt is quite a name.

mdp2021 · 3 years ago
Not just any swap: it's mirroring - for the reverse direction.
psifertex · 3 years ago
Thanks! We debated it some internally and I'm glad it won out, I think it's worth it. Plus, it has a nice logo that goes with it.
mwcampbell · 3 years ago
Can any of these decompilers make effective use of a Microsoft PDB file, if I have one, to include original symbols in the decompiled output? What I'd really like to do with a decompiler is feed it a final compiled EXE or DLL of my own code and see what it looks like after it's been run through whole-program optimization. In that case, of course, I have a PDB file.
ok123456 · 3 years ago
IDA does.
psifertex · 3 years ago
Binary Ninja can as well (sorry for the delay, been on vacation this week) though none of the tools will download and use PDBs that might be available via public servers or otherwise by default in the configuration we're using on dogbolt. It would potentially be possible but our goal isn't to provide a test of all tools in all possible configurations as much as it is to get a good overview. Once you start tweaking each tool differently you're better off running that sort of analysis locally.
spaintech · 3 years ago
Ha! that was funny, I wonder though, getting fed tons of code, couldn’t Godbolt leverage code—-> Compiler Obj —-> Assembly as a mean to train an AI decompiler ? Food for thought.
KMnO4 · 3 years ago
I've always wondered about this. Compilers do a LOT of irreversible stuff. For example, symbol names usually aren't needed (unless you have a reflective language).

Where AI would really shine is reversing the (only seemingly reversible) optimizations. For example, GCC converts "x * 14" into "(x << 4) - x - x". Of course, you can never be 100% sure the programmer didn't actually want "shift left by four followed by two subtractions", but I'm convinced that 99% of the code I write is fairly predictable and statistically similar to whatever giant codebase you train it on.

lwswl · 3 years ago
Symbol names could be inferred from context
sargun · 3 years ago
Throwing AI at the problem might not actually be the worst suggestion. I wonder how the likes of copilot model the AST. Heh, you might even be able to build an approximation of a compiler using AI.
tralarpa · 3 years ago
I think it would be easier and faster to just take the millions of open source projects on github for that :)
unsafecast · 3 years ago
...which don't have binaries. It's easier for Godbolt, since the whole purpose of the website is to compile and show output. If you crawl GitHub you need to compile the projects yourself, much more difficult.
planede · 3 years ago
Just take all of Debian packages, or something like that.
thesz · 3 years ago
Maximal size of executable is 2MB. So, it is not possible to torture it with the ghc-compiled Haskell program.
no_time · 3 years ago
IDA license sponsored by "Yiang Ling Personal License"?

EDIT: Site has changed in multiple ways in the last 30minutes I've been trying to submit my sample. Best of luck in keeping up with demand.

psifertex · 3 years ago
Nope, Ilfak gave us a license for it and as Binary Ninja devs we're using a legitimate licensed copy of Binary Ninja as well. All above board and we're hoping to add more commercial decompilers in the future as well as we can integrate them and the companies behind them are willing.

RE: Demand. We just got 2x the workers but as the easy coast wakes up I'm not confident it'll hold up too well, several of the decompilers are... VERY resource intensive so there's really no good way without an exorbitant amount of compute to scale to heavy demand.

Eventually a better queue system with better pre-processing to filter invalid things is on our todo list

rfoo · 3 years ago
Unrelated, but it's amazing that over the years I have seen all of misspells of "Jiang Ying" and the "ang ing" part is always right. :P
unnouinceput · 3 years ago
HN crowd decompiled the website
psifertex · 3 years ago
Yeah, sorry about that. We're working on getting it up again but no promises. I'm on vacation in Europe while the rest of the team is about to head to sleep so might be a bit before we have it more stable.

Deleted Comment