For all the people represented in the training data to receive royalties would be an incredible wealth transfer to the Extremely Online. My forum posts, StackOverflow answers etc are also contributing to the model outputs. The training data, by volume, mostly belongs to blog authors, redditors, Wikipedia editors, to us!
The people in that counting to infinity subreddit would get compensated a lot if this were fully automated - their posts were so overrepresented in the training set that many of their usernames became complete tokens (e.g. SolidGoldMagikarp).
I object to calling people chatting online artists.
However, ultimately nobody is going to pay them more than the value of their posts to the AI company which puts a severe cap on what that’s actually worth. People who post a great deal of online content might be worth compensating a few thousand dollars, but it would be hard for them to then turn that down.
I think the lower bounds of someone signing away rights to their whole art portfolio is more towards $1m than few thousands. Few k is just a month's salary that they can "make" themselves. Offers that small would be almost off-putting.
Having a relatively new LinkedIn account now is probably a very bad move if you don't have an established network to reach out to for jobs. There are tons of AI generated profiles flooding every job post (particularly remote) from scammers who create new LinkedIn profiles. It's one of the most frequent signs of a fake submission.
Controlled burns aren't impossible in chaparral, even based on the logic of that article the controlled burns just need to be less frequent and more intense than for forest. There's no reason they couldn't be done.
Some of the neighborhoods that burned consist of very steep hills, single family houses on stilts, narrow winding roads, retaining walls, and almost no clearance for anything. I can’t imagine a controlled burn being done safely.
A founding CTO is more effective than a hired CTO, because the founding CTO has more moral authority to create a consistent system. In other companies there's infighting between people (senior engineers, senior managers) with different architectural preferences (e.g. microservices vs monoliths, Java vs Python). These senior people get half what they want, meaning half your system works one way and half the other. A CTO can hold to their singular vision.
It could be that the moral authority stems from having as much of a full picture as a single person can have over the entire lifecycle of the company, but I think a lot is also just the effect of "I got you here."
I'm glad pg named this effect, since I've talked about the related phenomenon for CTOs with many people.
> Those customers are already untrusted, so it really does not matter.
Perhaps it doesn't matter to the health of your network, but if it leads to a customer's account being disabled due to incorrectly assigned abuse, surely it would matter to them.
How in tarnation would they do that? To inject traffic into the network, the attacker would have to compromise the access network. The RADIUS attack is not going to accomplish that.
I mean, I know nothing about your network. If your network access servers are within a datacenter under your exclusive physical control, perhaps it's not an issue since it requires a man-in-the-middle position. Something like a neighborhood cabinet DSLAM could be open to abuse?
It would be interesting to allow users of models to customize inference by tweaking these features, sort of like a semantic equalizer for LLMs. My guess is that this wouldn't work as well as fine-tuning, since that would tweak all the features at once toward your use case, but the equalizer would require zero training data.
The prompt itself can trigger the features, so if you say "Try to weave in mentions of San Francisco" the San Francisco feature will be more activated in the response. But having a global equalizer could reduce drift as the conversation continued, perhaps?
At least for right now this approach would in most cases still be like using a shotgun instead of a scalpel.
Over the next year or so I'm sure it will refine enough to be able to be more like a vector multiplier on activation, but simply flipping it on in general is going to create a very 'obsessed' model as stated.
(Author) Good point! I picked the top categories by mentions. 24 entries that mentioned "code" or "copilot" (out of all total disclosures), and a third of them actually went out of their way to state that there was NO AI code gen; typically like so:
> it is not used in the game itself in any area: 3D models, code...
I suspect that a more rigorous perusal of the metadata (i.e., more than those quick search terms) would turn up some more, but either way, it seemed like such a tiny fraction of the whole.
Well shit, all of my code from the last 3+ years would need a trigger warning then :D
I've found Copilot (and its like) to be essential in the way I work.
It's a lot faster to ask an AI assistant to do the boring repetitive bits + me glancing through them than me writing and checking documentation and writing and getting bored and my ADD kicking in and now I'm on Wikipedia reading about some weird castle a baron built on top of a mountain just because. =)
> The single number that should summarize your expectations about any LLM is the number of total flops that went into its training.
One thing I've been curious about is whether a model that's trained well beyond the Chinchilla level of compute will suffer more from quantization. All of that information has to live somewhere within the weights, so it stands to reason that you may have to keep more bits of information to keep that performance benefit.
If so, it would also mean that a smaller model that's been "overtrained," but which can't be quantized without suffering quality loss isn't necessarily cheaper for inference than a larger model which isn't overtrained, but which can be aggressively quantized. I haven't seen anyone discuss this, but maybe there's a paper on it.
If you could characterize what level of overtraining leads to quality loss at different levels of quantization, you could possibly figure out a more optimal model for overtraining. E.g. if you train with 10T tokens and you see quality loss at 4 bit, and you train with 20T tokens and see quality loss at 6 bit, you can fit a curve to those data points to estimate the maximum amount of tokens the model can train on with the current methodology.