> If we get 1 req/s, even for a dataset of that size, this is not as cost efficient.
How many req/s do you have in mind for your system to be a viable option?
> Also EBS throughput (even with SSD) is not good at all.
It is not worse than S3 still, right?
> Chatnoir.eu is the only other common crawl cluster we know of. It consists of 120 nodes.
I have no deep ES experience. Are you saying, that to host 6TB of indexed data (before replication) you'd need 120 nodes ES cluster? If so, then reducing it to just 2 nodes is the real sales pitch, not S3 usage :)
How many req/s do you have in mind for your system to be a viable option?
> Also EBS throughput (even with SSD) is not good at all.
It is not worse than S3 still, right?
> Chatnoir.eu is the only other common crawl cluster we know of. It consists of 120 nodes.
I have no deep ES experience. Are you saying, that to host 6TB of indexed data (before replication) you'd need 120 nodes ES cluster? If so, then reducing it to just 2 nodes is the real sales pitch, not S3 usage :)