I personally think that context will improve these systems. The current task the...

I personally think that context will improve these systems.

The current task they are performing is extremely difficult: I think it is analogous to having a listener receive anonymous random phone calls from people they have never heard before, speaking in a random accent, in a difficult/noisy environment, who speak for a few seconds about a random topic including proper nouns that the listener has never heard before and then promptly hang up and the listener is asked for an exact transcription with no mistakes.

I don't think it is surprising that the error rates produced by Mechanical Turk workers seem high for some of these tasks, and it actually seems like a big accomplishment that a speech recognition system can do nearly as well under such difficult conditions.

Context about the speaker or the current subject of conversation would clearly help, as would visual cues, and the ability to ask the user to repeat or clarify something.