Does it just download pre-trained DALL-E Mini models and generate images using them? Because I can't seem to find any logic in that repo other than that. I'm not into that field, just curious if I'm missing something.
To add to the sibling comment. The challenge is not converting the weights as such. Pre-trained model weights are just arrays of numbers with labels that identify which layer/operation they correspond to in the source model. The challenge is expressing the model in code identically between two frameworks and programmatically loading the original weights in, since these models can have hundreds of individual ops. Hence why you can't just load a PyTorch model in Tensorflow or vice versa.
There are tools to convert to intermediate formats, like ONNX, but they are limited and don't work all the time. The automatic conversion tools usually assume that you can trace execution of the model for a dummy input and usually only work well if there isn't any complex logic (e.g. conditions can be problematic). Some operations aren't supported well, etc.
This isn't always technically difficult, but it's tedious because it usually involves double checking that at all steps, the model produces identical outputs for a given input. An additional challenge when transferring weights is that models are fragile and minor differences might have large effects on the predictions (even though if you trained from scratch, you might get similar results).
Also for deployment, the less cruft in the repository the better. A lot of research repositories end up pulling in all kinds of crazy dependencies to do evaluation, multiple big frameworks etc.
I don't understand why execution of a model with the same layers and weights would be different between PyTorch and Tensorflow.
Is it a problem of accumulation of floating-point errors in operations that are done in a different order and with different kinds of arithmetic optimisations (so that they would be identical if they used un-optimised symbolic operations), or is there something else in the implementation of a neural network that I'm missing?
In principle you can directly replace function calls with their equivalents between frameworks, this works fine for common layers. I've done this for models that were trained in PyTorch that we needed to run on an EdgeTPU. Re-writing in Keras and writing a weight loader was much easier than PyTorch > ONNX > TF > TFLite.
Arithmetic differences do happen equivalent ops, but I've not found that to be a significant issue. I was converting a UNet and the difference in outputs for a random input was at most O(1e-4) which was fine for what we were doing. It's more tedious than anything else. Occasionally you'll run into something that seems like it should be a find+replace, but it doesn't work because some operation doesn't exist, or some operation doesn't work quite the same way.
It's just that expressing those "layers and weights" in code is different in tensorflow and Pytorch. I think a good parallel would be expressing some algorithm in two programming languages. The algorithim might be identical, but JS uses `list.map(...)` and python uses `map(list, ...)`, and JS doesn't priority queues in the "standard lib" while Python does, ...etc. Similarly, the low level ops and higher level abstractions are (slightly) different in Pytorch and Tensorflow.
I'm not too familiar with Tensorflow, so I can't give an example there, but a similar issue I recently faced when converting a model from Pytorch to ONNX is that Pytorch has a builtin discrete fourier transform (DFT) operation, while ONNX doesn't (yet. They're adding it). So I had to express a DFT in terms of other ONNX primitives, which took time.
In principle all operations can be translated between frameworks, even if some ops aren't implemented in one or the other. This, however, depends on whether the translation software supports graph rewriting for such nodes.
Lambdas and other custom code are also problematic, as their code isn't necessarily stored within the graph.
Seems like it'll be a serious issue for people hoping we can someday upload human brains into machines if we can't even transfer models from TensorFlow to PyTorch reliably.
Unrelated problems, really. Having written such translation library, I can say with confidence that the only reason for this is lack of interest in it.
Graph to graph conversion can be tricky due to subtle differences in implementation (even between different versions of the same framework), but it's perfectly possible, though not many utilities go all the way to graph rewriting if required.
Problems arise with custom layers and lambdas, which are not always serialised with the graph depending on the format.
Human brains have high degrees of plasticity -- our brain is much more generic than its usual functional organization would suggest. I don't think we'd be able to upload brains ("state vectors" was the sci-fi buzzword) before digital supports are able to emulate that.
They converted the original JAX weights to the format that Pytorch uses. Because JAX is still fairly new, it can be a lot easier to get Pytorch to run on e.g. CPU. I do find the number of upvotes interesting and I imagine many people just upvote things that have DALLE in the title, to a degree.
Look how much easier it is to install & run, people are interested in and up-voting the result, not how much work was (or wasn't) required to achieve it.
> RuntimeError: This version of jaxlib was built using AVX instructions, which your CPU and/or operating system do not support. You may be able work around this issue by building jaxlib from source.
Unfortunately, following the instructions to build JAXlib from source (https://jax.readthedocs.io/en/latest/developer.html#building...) result in several 404 not found errors, which later cause the build to stop when it tries to do something with the non-existent files.
Unfortunately, it looks like I won't be running this today.