> If we get 1 req/s, even for a dataset of that size, this is not as cost effici...

> If we get 1 req/s, even for a dataset of that size, this is not as cost efficient.

How many req/s do you have in mind for your system to be a viable option?

> Also EBS throughput (even with SSD) is not good at all.

It is not worse than S3 still, right?

> Chatnoir.eu is the only other common crawl cluster we know of. It consists of 120 nodes.

I have no deep ES experience. Are you saying, that to host 6TB of indexed data (before replication) you'd need 120 nodes ES cluster? If so, then reducing it to just 2 nodes is the real sales pitch, not S3 usage :)