What was your resolution to this issue? Did you fix your service to account for the API being down, or did you switch to an entirely different approach?
I can't recall the exact implementation detail, but We then logged the number of running instances in a file, and read the last qty of instances and the delta from when launched - and made the system not get over aggressive if it couldnt read the current set.
We also added smart loading across AZs due to spot instances getting whacked when our fleet was outbid and AWS took them back.
As well as other monitoring methods to be sure we werent caught with a smart system doing dumb things.