Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

I tend to agree :). If we get 1 req/s, even for a dataset of that size, this is not as cost efficient.

For that kind of use case, I'd probably start using minio.

> Seems comparable to AWS ElasticSearch service costs: > - 3 nodes m5.2xlarge.elasticsearch = $1,200 > - 20TB EBS storage = $1,638

Don't forget S3 includes replication. Also EBS throughput (even with SSD) is not good at all. Also our memory footprint is tiny. This is necessary to make it run on two servers.

Finally, cpu-wise, our search engine is almost 2x faster than lucene.

If you don't believe us, try to replicate our demo on an elastic search :D.

Chatnoir.eu is the only other common crawl cluster we know of. It consists of 120 nodes.



> If we get 1 req/s, even for a dataset of that size, this is not as cost efficient.

How many req/s do you have in mind for your system to be a viable option?

> Also EBS throughput (even with SSD) is not good at all.

It is not worse than S3 still, right?

> Chatnoir.eu is the only other common crawl cluster we know of. It consists of 120 nodes.

I have no deep ES experience. Are you saying, that to host 6TB of indexed data (before replication) you'd need 120 nodes ES cluster? If so, then reducing it to just 2 nodes is the real sales pitch, not S3 usage :)


What about d3en instances? Clustered, and together with minio you might reach similar performance. Only issue is the inter-region traffic, it would need to be inside the same AZ

EDIT: Realizing that d3 has just slow HDD


Have you checked out the new EBS gp3 disks? Throughout vs cost is much better on those than gp2, and also cheaper than Provisioned IOPS




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: