Hacker Newsnew | past | comments | ask | show | jobs | submitlogin
Stable Diffusion 3 banned on CivitAI due to license (civitai.com)
44 points by samspenc on June 17, 2024 | hide | past | favorite | 19 comments


> we are temporarily banning [...] all models or LoRAs trained on content created with outputs from SD3 based models.

Again we see this bizarre standard where the licensing of AI model output is taken more seriously than the licensing of any other type of training data. CivitAI is absolutely filled with LoRAs trained on the work of artists who refuse to license it for that purpose, but only LoRAs based on the output of an AI model with a restrictive license are over the line apparently?

It's a similar story with LLMs, e.g. OpenAI scooping up unlicensed data to train their models, but then turning around and proclaiming that the license of ChatGPTs output forbids using it to train a competing model. The license of data used for training either matters or it doesn't, you can't have it both ways.


I wonder if it's enforceable, since model output isnt copyrighted. At least in the US


(IANAL) Assuming it's not a copyright violation, just a contract violation, if you get access to the model weights without entering said contract you could make a strong case that none of the restrictions apply to you. However, as far as I can tell this is not settled case law, and the entities distributing these model weights are definitely hoping for them to qualify for some level of copyright protection. However, even if thats the case, the resulting images are not currently subject to copyright in the US, and can be used freely by non-parties. Even if that changes down the line, these images would almost certainly be considered derivative works from the model, and these models already assume that training is fair use anyway.


What happens when output licensing is enforced. Will be a shitstorm no?


Right, I think that's really the concern. People may care about artists in an abstract way, but the only thing that they are practically concerned about when it comes down to it, is what has a strong chance of causing significant legal or other real world issues for them.

It's also partly a reaction to the whole episode and what everyone is saying. Everyone is laying one on SD3 right now. It's what you do. Not that they didn't earn it.

But since everyone is talking about it, that must have at least got them thinking. It isn't good, there are alternatives, company with questionable actions that seem about collecting fees from the model. Makes sense to just avoid it. PixArt and Lumina should get more attention anyway.

I keep hearing about closed source diffusion transformer projects. I hope we will see more open source like Lumina.


Output licensing is not enforceable currently. Only humans can produce copyright-able content. Haven't heard a single country that has changed this.


Should be banned on performance.

It was obvious with how many times they said “SAFETY” (lobotomized) during their press releases and comments, but I’m still pretty surprised just how bad it is.

If you haven’t seen it, the first image is the now-iconic SD3 unofficial picture.

https://www.reddit.com/r/AIfreakout/comments/1dekkbz/why_is_...

Luckily for open source and local running, Lumina and PixArt are catching up.


To be fair, SDXL was really bad at launch too. Now people are comparing sd3 base model to fine tuned SDXL outputs... But yeah, I don't have much hope for stability now that the original diffusion team and even comfyanonymous all left. Long live Chinese OS models, I guess


SDXL had bad details but I could absolutely do people in poses other than standing up.

While those grass laying images are iconic, Lykon (one of their employees) said on Discord “I don’t know what people could possibly be promoting to get human blobs.” I sent him the first image I did of “photo of a woman getting a massage on a massage table” or something like that and it’s even worse. I went back and checked and while SDXL isn’t great at it it’s not a complete body horror unlike SD3.

People have noticed that SD3 is way better at posing Barbies than people, so it’s pretty clear they screwed it up in the name of safety.


I'm fairly confident anyone thinking this is about safety is just paranoid and misinformed.

SD3 medium is a much smaller model than SDXL. I don't really get why they would have published it in the current state, but I'm kind of certain that the full sized versions will be absolutely fine. You don't need pornography in the training set to understand basic human, non-genital anatomy.

Small model LLMs are hot garbage too.


The model can effortlessly create a highly realistic and anatomically correct photograph of a koala riding a motorcycle, but begins to produce crazy body horror as soon as a human shape is involved. It's neutered.

Artist styles are aggressively scrubbed out as well. It has no idea what a Picasso or Dali style is, even though there must be plenty of derivative work in the trainingdata, that should have taught it what those styles are even with all the original works removed.

It can do the Mona Lisa, but the face itself will become comically chunky, almost as if it is censoring itself by consciously turning "forbidden" content into something that is definitely not that.


I fully believe that artist styles and specific people have been removed from the training data. I fully believe it doesn't know about porn or especially porny positioning. I don't really care. That's reasonably something for community fine tunes.

I don't believe for a second that the model can "effortlessly" produce a highly realistic koala. It's just not a good model. It makes a ton of mistakes. I don't really believe that it is meaningfully worse at drawing non-erotic humans than any other similarly.

People are feeling personally attacked that the model isn't meeting their standards.


Here's the first five results of a koala riding a motorcycle and the first five of a man kicking a football: https://imgur.com/gallery/sd3-tests-u8iXAcx

I added the koala kicking a football as well, I have to admit that they come out just as bad. Maybe it's irregular positions in general it just can't do? But from the tests I've done so far mainly humanoid shapes almost always have some issues.

The problem is, if certain training data is blanket removed, it creates holes in the understanding a model has. We see this in language models as well: censored models can get very obstinate in their refusal to discuss certain completely normal topics, because it links them to something that has been scrubbed.


> I added the koala kicking a football as well, I have to admit that they come out just as bad.

And this is exactly my point.

> The problem is, if certain training data is blanket removed, it creates holes in the understanding a model has. We see this in language models as well: censored models can get very obstinate in their refusal to discuss certain completely normal topics, because it links them to something that has been scrubbed.

This is doubly wrong. First, not having porn in your dataset absolutely has no bearing on your ability to draw a man kicking a football. Second, the thing you're discussing is not a lack of training data but a fine tuning to try and remove it due to the inevitability of some porn slipping in. SDXL did this too, but its a perfectly serviceable model.

What we're seeing here is almost certainly just the model being bad because it's much smaller. Drawing unusual poses like kicking a ball is much more difficult than people expect.

This drastic drop is performance is entirely consistent with similar drops of performance seen on LLMs in their respective space for a comparable change in parameter count.

People are contorting themselves into a conspiracy because they're seeing a lot of bad humans get drawn because HUMANS ARE WHAT PEOPLE OVERWHELMINGLY TRY TO DRAW. The model is just generically bad.


But the model is _really good_ at other subjects, including very complicated ones, like the motorcycle.

By "certain training data" I don't mean porn. It's probably for the best that they did, but the aggressive removal of all copyrighted data, art and images of celebrities must have impacted the output quality compared to the older models that just used everything. This includes many pictures of people playing football, women lying in grass and other similar things. Most finetuning is basically about bringing that back.

If "HUMANS ARE WHAT PEOPLE OVERWHELMINGLY TRY TO DRAW", then you would expect the model to be better at it, no? Or at least have _some_ understanding of poses besides T.

It's not a conspiracy to wonder why the main thing that a lower size models fails on is humanoid poses, when things like objects, animals and architecture generally come out just fine. Are you saying an image like this is significantly easier to create than a man kicking a football?

https://imgur.com/gallery/sd3-test-8nQyBz2

Try the prompt yourself, it will keep producing perfectly fine images with only small mistakes. Further testing seems pointless though, since by now it's confirmed that the current model was a failed product that got rushed out for some reason.


You're vastly underestimating the difficulty of drawing humans. Move your hands just a little bit and you have a completely different geometry. Humans are MUCH harder to draw than a motorcycle. A motorcycle pretty much always looks the same.

It is much, much easier to draw your link vs. a human.

An aztec castle (edit: not sure if this is a "castle" but age of empires 2 taught me that's what it is) always looks like that. Foliage has many variants but all of them are basically fine. Same for rocks. The seal has minor variation. Probably not a lot that the model cares for. The seal is wrong anyway.

Man kicking a football has many possible interpretations and most of the model errors are simply reflecting the model being bad at deciding to do just one.

The model could be bad because it simply has less data, but it's much easier to explain it as bad because it's a smaller, less capable model. Just like we see with LLMs


Comfy seemed to be saying they deliberately manipulated the weights to basically sabotage anything remotely erotic.


Sure. You have to twist SDXL's arm hard to get it to draw a penis too.

That does not explain it being really bad. What does explain it being really bad, is the fact that it's just small and bad.


Ok to train on human outputs but not model outputs?




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: