More

albertan017 · on March 18, 2024

We find several work on refining Ghidra decompilation results with GPTs, that could be another interesting directions!

albertan017 · on March 18, 2024

Yes, GPT4 is very impressive, as it's not directly trained on the decompilation. We're working on improving our model, please keep watching updates!

albertan017 · on March 18, 2024

Thanks. We acknowledge that an LLM cannot completely replace human expertise in decompilation, much like GPT-4 has not achieved true human-like intelligence. However, the aim of our llm4decompile project is to do something like GPT-4, and offer assistance and enhance productivity in the decompilation process.

As for test suites, it's one of our project's main challenges—figuring out which functions satisfy the expectations of reverse engineers, how to autonomously produce high-coverage test suites, and how to objectively qualify decompilation outcomes without relying solely on human judgment. Looking forward to your advices!

albertan017 · on March 18, 2024

Thanks! The concern is how to uniformly uplift binary code from various architectures and configurations to the same IR like RzIL? Is there a method to automate the disassembly process reliably across these different systems?

xvilka · on March 18, 2024

What do you mean? The Rizin code does all the hard part (we add uplifting code for every architecture manually; you can see a list of supported architectures using `rz-asm -L` and check for the `I` letter, which means "IL." You need to call the necessary APIs. See, for example, how it's done in one of the integration tests[1]. As for the use of Rizin from Python, we have a rz-bindgen[2][3].

[1] https://github.com/rizinorg/rizin/blob/dev/test/integration/...

[2] https://rizin.re/posts/gsoc-2022-rz-bindgen/

[3] https://github.com/rizinorg/rz-bindgen

albertan017 · on March 18, 2024

Ideally, with a substantial dataset of obfuscated JavaScript and corresponding raw code, a language model could potentially make good predictions. The first key difficulty, however, is collecting a large-scale dataset and setting up a system for automatic compilation and segment out the binary-source pairs.

albertan017 · on March 18, 2024

Thanks! We're working on Ghidra/IDA pro. The problem we face is the right kind of data to test with and how to evaluate it. It's like there's no "standard" benchmark/metrics that everyone uses for decompilation.

albertan017 · on March 18, 2024

Thanks! But people want an all-in-one solution for decompilation. Given the vast array of architectures and compilation settings, and the fact that these information are usually not predetermined, finding a way to effectively navigate this complexity is quite difficult.

albertan017 · on March 18, 2024

Thanks! The model is trained only for O0-3, not support for obfuscation. There's still a long way for llm to go.

albertan017 · on March 18, 2024

We're interested in the toolchain, could you share the link or reference to it? GPT4 does an amazing work, we're also very surprised that it can work.

sitkack · on March 21, 2024

I don't have a toolchain, I am predicting research that will be able to detect the exact toolchain used from the binary. If you can detect the toolchain, then you can iterate to a fixed point (grind) until you recover a perfect copy of the source.

albertan017 · on March 18, 2024

Thanks! We're working on Ghidra/IDA pro. The problem we face is the right kind of data to test with and how to evaluate it. It's like there's no "standard" benchmark/metrics that everyone uses for decompilation.

mahaloz · on March 18, 2024

As others have said, the standardization of metrics is still something debated, but at the same time, this space has been explored by various top-tier papers that your paper did not cite. For example, DREAM [1], evaluated using the classic metric of goto-emittence. Rev.ng [2], evaluated using Cyclomatic Complexity and gotos. SAILR [3], evaluated using the previous metrics and a Graph Edit Distance score for the structure of the code.

I feel that without a justification for dropping previously established metrics by the peer review process, you weaken your new metrics. However, I still think this is an interesting paper. It just could be made more legit by thoroughly reading/citing previous work in the area and building an argument for why you may go against it.

[1]: https://net.cs.uni-bonn.de/fileadmin/ag/martini/Staff/yakdan... [2]: https://rev.ng/downloads/asiaccs-2020-paper.pdf [3]: https://www.usenix.org/system/files/sec23winter-prepub-301-b...

sitkack · on March 19, 2024

References make a paper stronger!