Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

It sounds like you're literally describing Dreambooth. You can easily and quickly train models for certain styles like Disney characters, certain illustrators, specific artists and even tailor the model to a specific person so you can produce countless pictures of them in different poses, locations and clothes. This is already available.


Indeed: https://dreambooth.github.io

Note that training is extremely expensive, and is beyond the capabilities of most end users. Here are the details of their training method:

> Given ~3-5 images of a subject we fine tune a text-to-image diffusion in two steps: (a) fine tuning the low-resolution text-to-image model with the input images paired with a text prompt containing a unique identifier and the name of the class the subject belongs to (e.g., "A photo of a [T] dog”), in parallel, we apply a class-specific prior preservation loss, which leverages the semantic prior that the model has on the class and encourages it to generate diverse instances belong to the subject's class by injecting the class name in the text prompt (e.g., "A photo of a dog”). (b) fine-tuning the super resolution components with pairs of low-resolution and high-resolution images taken from our input images set, which enables us to maintain high-fidelity to small details of the subject.

Each fine-tuned model is a copy of the original model. So if the model is 10GB, the fine tuned version will be a separate 10GB file. That might not sound like a lot, but it quickly adds up.

In this case, end users are artists. One could imagine a cloud-based art program which will fine tune on demand. That certainly seems like a good startup idea.


Dreambooth extension for automatic1111 just came out. Can be run on CPU even. Haven't tried that extension myself yet, but I have followed a youtube video last week and I trained it using a colab in a few minutes (of training - getting the hang of the whole process was maybe 20 or 30 mins including watching the vid?). I think dreambooth is already perfectly within reach of regular users, and is already being used for... ehem... questionable purposes that took 1000's of images and days and days of tweaking and training for 'traditional deepfakes' just 6 months ago. The pace of advancement is breathtaking, it's literally impossible to keep up even if you spend 100% of your time on this.


Thanks for the tip about automatic1111 with dreambooth!

Do you happen to have a link to that YouTube video you followed?


It was actually two now that I look in my bookmarks: https://www.youtube.com/watch?v=w6PTviOCYQY and https://www.youtube.com/watch?v=FaLTztGGueQ . That first one is about 1 month old and was already about running Dreambooth locally. So to be perfectly honest I'm not really sure any more which info I got from which video. Probably most from the second one. It's worth it for just the thumbnail - although I haven't been able to get results at that level yet, holy smokes is it impressive.


Not OP, but this is fastest tutorial for getting dreambooth up and running in automatic1111 I’ve found:

https://youtu.be/_GmGnMO8aGs


"Extremely expensive" - you can do dreambooth on a free colab with recent optimizations.

This area is moving really fast, one-click solutions are already being created.

People are also averaging weights of multiple models to create new models based on multiple other dreambooth models, and it works surprisingly well.

The stable diffusion reddit is a good place to see how all of this is developing.


Stable Diffusion is probably far from the state of the art, but good $DEITY did it open the floodgates! Just watching the open source scene evolving as a bystander is interesting.


What kind of GPU would you need to run this locally? (the training I mean)


Using offload to CPU (with dreamspeed) one can get away with a 8gb gpu. I haven't tried training myself yet, but there are reports of it working with 8gb vram (here for example https://www.reddit.com/r/StableDiffusion/comments/xwdj79/dre...

I have an rtx 2070 with 8gb and it has been working quite well for me. However there are always models that will not fit. For those running on the cpu with potential nvme offload is not that bad. For example a single inference on bloom 7b (30gb of ram required just for weights) on a 32gb ram machine takes about 30s (it has to offload few GB to nvme). This is on zen 3 ryzen and with no gpu use. I can't wait to try cpus that support avx512.


> Note that training is extremely expensive, and is beyond the capabilities of most end users.

You can't run it on consumer hardware, but you can just rent a GPU (or use a free collab book) for a few hours to generate the model. Then you download and reuse it locally at will. Yes, you need storage and if you train often it can get expensive, but it is by no mean out of the end user, at least professional end users, capabilities. And of course there are growing libraries of freely available pretrained models.

> In this case, end users are artists. One could imagine a cloud-based art program which will fine tune on demand. That certainly seems like a good startup idea.

Very much agree about this. At least for a while, I strongly believe that AI will just be another tool for artists willing to embrace it, far from replacing them.


I'm building an app that lets users with no technical background train their own dreambooth models for $2-$4: https://synapticpaint.com/dreambooth/info/ They can also share their trained models for others to use. I think making this easy (no figuring out how to do a git pull or rent a gpu) plus the community sharing aspects will make this technology a lot more accessible to artists and general users.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: