I wonder how much of these conclusions are Claude-specific (given that Anthropic only used Claude as a test subject) or if they extrapolate to other transformer-based models as well. Would be great to see the research tested on Llama and the Deepseek models, if possible!