What exactly is "Big" here? It is about 1000 hard drives, several racks...

nemothekid · on Oct 18, 2018

Usually the “big” qualifier is a fuction of RAM, not hard disk space. Getting hundreds of petabytes of data onto persistent storage in a “large” room has been possible for many years now.

“Big” data was never about how to store the data.

Iwan-Zotow · on Oct 18, 2018

What I'm trying to say, as soon as you could fit data and processing unit(s) into one well cooled room in datacenter, managed by two guys per shift, it is not "Big" problem anymore. Making it all local will probably speed-up their queries/analytics enormously as well

teraflop · on Oct 18, 2018

In my book, "bigger than 99.9% of organizations will ever encounter" is big enough to qualify as "big".

randyrand · on Oct 17, 2018

Ya, big data only starts at 100,000,000 PB. Everyone knows that.

There is no size requirement. It's more about what you collect, the frequency, the coverage, and how you use it.

Iwan-Zotow · on Oct 17, 2018

Of course there is

And anything you could fit into, basically, one well cooled room is not "Big" anymore, sorry to tell you that

PhasmaFelis · on Oct 18, 2018

"Big data" is anything that's too big to cram into a standard database and still access in reasonable times. There's a practical definition, it doesn't just mean "YUUUGE".

TomVDB · on Oct 17, 2018

I think you're off by a factor of 100?

Iwan-Zotow · on Oct 17, 2018

You think wrong

100TB SSD from Nimbus, https://nimbusdata.com/products/exadrive-platform/scalable-s...

saagarjha · on Oct 18, 2018

That website doesn't list a price, but I doubt that running a rack of these would be cost-effective. I wouldn't be surprised if those drives have a cost per terabyte at least a couple of times that of commodity drives.

samontar · on Oct 18, 2018

Are you using this in prod? I’d be very surprised. I wouldn’t use this for prod data.

Iwan-Zotow · on Oct 18, 2018

Not yet, but people are looking into them

My point is, you could fit all those Uber data (I know, I know, replication, sync etc) into racks in the SINGLE well cooled room in datacenter, managed by 2 guys per shift.

And this is not "Big" as far as I can see.

Probably will speed up their queris/analytics as well all things being local