I think this might be a result of "the only winning move is not to play", so to speak. If the game is, in the mind of the AI agents, unwinnable, not playing is not an option, therefore it begins to pick random actions instead.
I'm not sure if the AI can surrender (I only managed to watch the first two games as it was rather late at night) but it might be a path to explore; having the AI give up if the game cannot be won anymore.
At what percentage would you allow an AI to consider a game unwinnable? While an AI that behaves erratically when the odds are low might be worth allowing it to be considered forfeit worthy, but the thing about humans is we make mistakes. Therefore an ideal AI that can continue to execute reasonable moves should have a lower percentage threshold where it decides to forfeit. See this match[0] for an example of a spectacular comeback that I feel an AI might have considered forfeit worthy if not well defined.
Maybe this is a limitation of self-play. If the opponent an AI faces during training is always optimal, then there's no surface area of mistakes. The losing AI, in its model/mind, knows that the game is over after a specific threshold. So it hasn't learned how to optimize for capitalizing on mistakes.
I wonder if this situation can be fixed by adding more randomness. For example, force AI'1 to be in a losing position to AI'2, but then suddenly switch the power level of AI'2 to be much weaker (where mistakes happen) so that AI'1 learns how to fight its way out of tough situations.
One of the most interesting takeaways from the post game interview for me was that the AI can be very stupid if you just blindly throw it in a self-play setting but with clever use of randomization (modifying power levels) and action restrictions (for example, only allowing the agent to spend an anti-invis item when a nearby enemy goes out of sight) it is possible to provide better learning opportunities for the AI.
> for example, only allowing anti-invis items when an enemy goes out of sight
These are the kind of actions you specifically don't want to code in because you're throwing in human knowledge. You want the AI to learn by itself that using anti-invis when everyone is visible is a low-value move.
The purist in me was even mad that they had a hand-crafted evaluation function. (e.g. prefer gold, prefer taking towers, each given some arbitrary value)
I'm not sure if the AI can surrender (I only managed to watch the first two games as it was rather late at night) but it might be a path to explore; having the AI give up if the game cannot be won anymore.