One way to alleviate this problem is to treat all failures as having a fixed cos...

One way to alleviate this problem is to treat all failures as having a fixed cost equal to an expensive successful request. E.g. treat all >= 400 HTTP status codes as having taken 500ms. This works well even if there's a stable stream of faulty requests, since it'll affect all backends equally.