Yeah, it's not like they will label something a train just because a single person says so. But if you have 10k responses with 95% confidence saying it's a train, it's very likely to be the case.
For unambiguous images almost all humans will label them the same way. For ambiguous ones humans will differ. Presumably they'll accumulate stats on each image and will be able to detect cases like this.
unless a properly obfusicated bot net has seeded the data set with -everything is a train- responses to the tune of >>10k responses with 95% confidence saying it's a train<<