I think this might be a result of "the only winning move is not to play", so to ...

wwhitlow · on Aug 6, 2018

At what percentage would you allow an AI to consider a game unwinnable? While an AI that behaves erratically when the odds are low might be worth allowing it to be considered forfeit worthy, but the thing about humans is we make mistakes. Therefore an ideal AI that can continue to execute reasonable moves should have a lower percentage threshold where it decides to forfeit. See this match[0] for an example of a spectacular comeback that I feel an AI might have considered forfeit worthy if not well defined.

[0]: https://youtu.be/LwSQv_sNZBI

ionforce · on Aug 6, 2018

Maybe this is a limitation of self-play. If the opponent an AI faces during training is always optimal, then there's no surface area of mistakes. The losing AI, in its model/mind, knows that the game is over after a specific threshold. So it hasn't learned how to optimize for capitalizing on mistakes.

I wonder if this situation can be fixed by adding more randomness. For example, force AI'1 to be in a losing position to AI'2, but then suddenly switch the power level of AI'2 to be much weaker (where mistakes happen) so that AI'1 learns how to fight its way out of tough situations.

ufo · on Aug 6, 2018

One of the most interesting takeaways from the post game interview for me was that the AI can be very stupid if you just blindly throw it in a self-play setting but with clever use of randomization (modifying power levels) and action restrictions (for example, only allowing the agent to spend an anti-invis item when a nearby enemy goes out of sight) it is possible to provide better learning opportunities for the AI.

ionforce · on Aug 6, 2018

> for example, only allowing anti-invis items when an enemy goes out of sight

These are the kind of actions you specifically don't want to code in because you're throwing in human knowledge. You want the AI to learn by itself that using anti-invis when everyone is visible is a low-value move.

The purist in me was even mad that they had a hand-crafted evaluation function. (e.g. prefer gold, prefer taking towers, each given some arbitrary value)