The bare basics are working with a hex editor and understanding data types - ints, floats, null-terminated strings, length-prefixed strings etc.
I'd recommend taking a well documented binary file format (Doom WAD file?), go over the documentation, and see that you manage to see the individual values in the hex editor.
Now, after you have a feel for how things might look in hex, look at your own file. Start by saving an empty project from your program and identifying the header, maybe it's compressed?
If it's not, change a tiny thing in the program and save again, compare the files to see what changed. Or alternatively change the file a tiny bit and load it.
Write a parser and add things as you learn more. If the file isn't intentionally obfuscated, it should probably be just a matter of persevering until you can parse the entire file.
Thanks. That is kind of what I imagined. But I am not good at understanding the information from the hex editor. Reading the article I was a bit lost with the terms like little-endian and thought that that might be someone important concept to know for the task . I guess that that is what I should learn first.
Given the current reach of the project (read: still small!), I suspect for awhile yet the majority of successfully funded testing will be by concerned individuals with expendable income. It is cheaper and much faster to go through laboratory.love than it would be to partner with a lab as an individual (plus the added bonus that all data is published openly).
I've yet to have any product funded by a manufacturer. I'm open to this, but I would only publish data for products that were acquired through normal consumer supply chains anonymously.
Porting my binary & decimal palindromes[0] finding code[1] to CUDA, with which I had no experience before starting this project.
It's already working, and slightly faster than the CPU version, but that's far from an acceptable result. The occupancy (which is a term I first learned this week) is currently at a disappointing 50%, so there's a clear target for optimisation.
Once I'm satisfied with how the code runs on my modest GPU at home, the plan is to use some online GPU renting service to make it go brrrrrrrrrr and see how many new elements I can find in the series.
O(n^2) isn't required. One could do an in-place merge-sort, which is also always worst case, but with O(n*log(n)).
I suspect everyone turns to Bubblesort since the inputs are small enough that it doesn't matter (evident by the fact that it should fit within microseconds).
You could copy the instruction to a 16 byte sized buffer and hash the one/two int64s. Looking at the code sample in the article, there wasn't a single instruction longer than 5 characters, and I suspect that in general instructions with short names are more common than those with long names.
This last fact might actually support the current model, as it grows linearly-ish in the size of the instruction, instead of being constant like hash.
Yes, and I would even go as far as saying that even being functional isn't required. Trying to make something cool and failing counts as "hacker spirit".
It all boils down to getting your hands dirty, instead of passively consuming the products of others.
reply