Paper: Augmenting Decompiler Output with Learned Variable Names and Types [pdf]

This work reads to me like a neat smart hack. It's the kind of thing I miss reading about in more ambitious systems and learning papers. It "magically" reverses elements of a transformation. If I read it right, it follows a pattern of taking a structured input (source code), an unobserved but consistent transformation (compilation), and an output with latent structure (binary). They trained a transformer model in order to extract features from the output and label them with features that make sense in the representation of the input. This is harder if the transformation is noisy, perhaps as from different inlining behaviors. This is harder if the feature space of the input does not have labels that make sense to the model consumer. What they did seems like fun.

JPLeRouzic · 3 years ago

> They trained a transformer model in order to extract features from the output and label them with features that make sense in the representation of the input.

I wonder what would happen if this kind of trick was used to "reverse engineer" a specific cryptographic hash function?

These kind of transformation are not done at random. It's just that for a given output we can't infer input. It has not even to provide the most probable input, just a valid collision would be enough. It would be a new kind of differential cryptanalysis.