Hacker Newsnew | past | comments | ask | show | jobs | submitlogin
Exploring ZFS ZIL with Intel Optane and NAND (servethehome.com)
121 points by olavgg on Dec 11, 2017 | hide | past | favorite | 43 comments


I'm a little surprised that they didn't standardize the SSD capacity, or perform tests on empty vs nearly-full SSDs. Increasing size slightly decreases read speeds, just like L3 caches.


It is a little hard for us to do so since the M.2 devices came in 16GB and 32GB capacities. The U.2 Optane 900p comes in at 280GB only as of now not the multiples of 400GB that many other drives came in. We actually had, for example, an 800GB S3700 tested but left that result out as it was essentially the same as the 400GB drive.

On the full drive part, a several hundred GB SSD will almost always nearly "empty" in practice for this application. That is why you see 8GB and 16GB RAM based solutions (e.g. the venerable ZeusRAM.)


The amount of space used by he ZIL on a SLOG is very modest. Typically less than a gigabyte. So a SSD used as a SLOG will be practically empty it's entire life. The reason one would use a large capacity SSD for this purpose is to take advantage of the large SSD's higher iops and greater wear leveling.


it's unfortunate but not surprising that you can't buy very small, very fast SSD's for this purpose


Each NAND chip on an SSD is actually relatively slow. To get high read and write speeds, the access is multiplexed across all of the chips in the drive. Small capacity drives have a smaller number of NAND chips. This reduces the performance of the drives. In addition each NAND cell has a very limited number of lifetime writes it supports, typically 1k to 4k erase/write cycles. So drives do wear leveling, spreading writes over the whole drive. Therefore drives uses in extremely right heavy enterprise workloads may need to be very large just to support the required write endurance.

From what I understand Optane is fast enough to not need this extra controller processing. Resulting in the incrediblly latency performance demonstrated by Optabe drives. I suspect if Intel wanted to sacrifice latency, they could dramatically increase top line read and write performance by performing the same read and write interleaving that NAND SSDs do.


Intel's Optane SSDs do stripe across the controller's seven channels; that's the main reason why they're faster than the single-channel Optane Memory M.2 modules.

At any capacity or channel configuration, 3D XPoint devices benefit from not having to perform large block erase operations. They can perform reads and writes at the same granularity, allowing for in-place modification of data instead of requiring a log structure with garbage collection under the hood.


PMC-Sierra/Microsemi has some cool DRAM-based SSDs that do 1M IOPS (not a typo) with 16GB capacity. https://www.microsemi.com/product-directory/storage-boards/3... I don't think they're available in U.2 or M.2 format though.

NVDIMM-N would probably also be great for slog. https://www.anandtech.com/show/12029/micron-announces-32gb-d...


Yeah, DRAM chips are the only ones that currently offer the right balance of performance and capacity per die. NAND and 3D XPoint dies are too big and too slow per die for this use case, so they're better used as nonvolatile storage to flush the log to in the event of power failure, not as the primary log storage media.

I expect it will soon become common for high-end enterprise NVMe SSDs to include a few GB of DRAM accessible through the new Persistent Memory Region feature, so that they can be accessed through simple PCIe memory read and write operations, but will automatically be preserved by the SSD in the event of a power failure. It won't be as fast as a NVDIMM, but works with existing form factors and platforms.


Battery backed DRAM is likely a better choice for performance & endurance in most applications, as the Optane product line is a lame duck with memory that dies around the 1TB of writes mark, leaving you with lost data and a dead SSD.


> as the Optane product line is a lame duck with memory that dies around the 1TB of writes mark, leaving you with lost data and a dead SSD.

You seem to be much more concerned with venting anger than conveying accurate or relevant information.


Considering that the Intel drive in https://techreport.com/review/26523/the-ssd-endurance-experi... self-destructed when it ran out of guaranteed lifetime, I think it's a fair criticism.


There's nothing fair about making up a number that's wrong by multiple orders of magnitude.

I'm not a fan of the end of life behavior of Intel SSDs either, but it's a consistent policy that does work well for enterprise usage scenarios, with no unexpected data loss. They probably shouldn't be applying the same policy to the consumer products. But the fact that there are valid criticisms to be made doesn't mean this particular one was fair or germane.


Oh, I wasn't even looking at the value of the number. Yeah, it's bad to exaggerate like that without being clear about it. But something performance-heavy can burn out an endurance of 10 drive writes per day pretty fast.


Says over 10M IOPS in the link from microsemi, so… maybe a typo?


1M in NVMe mode, 10M in nonstandard mode. I don't know if there's any software that can use the nonstandard mode.


I am certain the volume of acronyms is intentionally hilarious in the title, so let me pre-empt the complaining by laughing at it. Definitions are in the article.


The article is self-deprecating in this aspect by squeezing in LOL as well:

> yes we found a way to fit another acronym in the piece


ZFS = Z File System, a next-generation file system from oracle ZIL = ZFS Intent Log SSD = Solid State Drive, a type of long-term data storage system with better read/write speeds than an HDD NAND = not actually an acronym, but a type of logic gate. Not AND.


Oracle? I think you mean Sun.


Hilariously enough, NAND is the only acronym. The others are initialisms.


You don't pronounce 'zill' and 'zef-ess' as acronyms?


Initialisms are acronyms.


Other way around. Acronyms are the subset of initialisms that are pronounceable.


Dictionary.com uses same definitions for both words, just in a different order. The first definition of acronym is when initials form a pronounced word; which is the second definition of initialism. So they're the same thing with the difference being vernacular.


So we know the CPUs have IME. What else? Is it conceivable that SSDs do as well? Is any Intel product safe to use?


No, nothing is safe. There's an NSA backdoor in your mouse.


you jest, certainly, yet... now I'm not sure.


They are called implants.[1] To GP, yes even hdds, I suspect similar with ssds.[2]

1: https://www.schneier.com/blog/archives/2014/03/cottonmouth-i...

2: http://spritesmods.com/?art=hddhack&page=1


This article reeks of being "sponsored" by Intel, but there is no mention of it anywhere in the article. So my first question is - what, if any role did Intel play in the creation of this article?

Second - I find it ironic after all the marketing buzz that Intel has finally admitted that the endurance of Optane is embarrassing compared to NAND devices. This was their original claim:

>When Intel announced 3D XPoint, the company said that it would be 1,000 times faster than NAND flash, 10 times denser than (volatile) DRAM, and with 1,000 times the endurance of NAND

Those claims are so far off from reality that I'm surprised their shareholders haven't sued them over the claims.


ServeTheHome is one of the most honest hardware review sites, especially in the very small field of server/enterprise hardware. AFAIK ServeTheHome is basically advertising for DemoEval which is an independent lab full of servers that you can rent access to. So there's a commercial interest behind it, but it's not Intel.

Here's their policy if you want to believe it: https://www.servethehome.com/about/editorial-copyright-polic...

Looking at Optane specifically, it is just that good. It is 2x-10x better than NAND (at ~2x the price) in certain aspects like mixed read/write or fsync. This is very far from Intel's 1000x claims, but Optane is still worth buying in some cases.

(Also, feel free to call me a paid shill if you want to get deeper in the mud.)


I love servethehome, but find their actual reviews (of which TFA is not one) somewhat frustrating.

Most motherboards they review get a rating of 9.something/10.

They provide some interesting insights but often gloss over aspects that I, at least, really care about.

I could go on, but it's easy to criticise. They're the only ones covering a lot of what they cover. And I love their forums.


Twice the price? Optane seems to be solidly over a dollar per GB, while quite good SSDs are in the 3-4 GB per dollar range.

The pricing is doing a lot better, though. The initial wave of Optane parts were above four dollars per GB, getting unpleasantly close to the price of a ramdisk.


2x-10x better when they claimed 1000x isn't "just that good". ESPECIALLY not when the write endurance is so abysmal.

The fact they don't come out and say, in the review, the hardware was furnished by Intel, and instead you have to actively search out the link you just provided, in and of itself is troubling. It's common courtesy and pretty much industry standard practice to list conflicts of interest at the start or end of an article like the one posted...


Hi tw04 - just as a heads up, none of the SSDs were furnished by Intel. We have a budget and bought all of the drives we used on this test. Some we had to buy second hand, some were scavenged from servers we have had to decommission.

Intel does provide us with hardware, as does AMD, Intel, Toshiba, Samsung, WDC, Cavium and etc. In fact, we have a full list of all relationships here: https://www.servethehome.com/about/editorial-copyright-polic...

Unlike virtually every other major review site (and a large number of bloggers), as of today we still do not have direct ad sales to any of these companies.

We also buy an absolute ton of gear for the DemoEval service we run which is why we have access to many different bits. For example, we have purchased a dozen Optane 900p's already since Intel would not furnish us with a U.2 drive.

Intel does not want to have the 900p eat into the P4800X which is a higher margin product. As a result, this is the kind of article (Optane 900p in servers) that Intel specifically would not want us to do. Since we buy hardware, we can.


> Intel does not want to have the 900p eat into the P4800X which is a higher margin product. As a result, this is the kind of article (Optane 900p in servers) that Intel specifically would not want us to do. Since we buy hardware, we can.

Intel has not to my knowledge actively discouraged or tried to prevent anyone from reviewing the 900p in an enterprise context. They haven't expressed any negative opinion about me including the 900p in my review of the P4800X, even though they provided hardware for both.

I agree that they probably don't want the 900p to hurt the sales of the P4800X, but I don't think they're particularly concerned about that happening. Most potential P4800X customers aren't going to go to the trouble of buying and installing large quantities of 900ps instead.


>Intel does not want to have the 900p eat into the P4800X which is a higher margin product. As a result, this is the kind of article (Optane 900p in servers) that Intel specifically would not want us to do. Since we buy hardware, we can.

That's such a cop out and an inaccurate statement. In the article you make it pretty explicit that the 900p is unfit for production workloads (which I agree with). In what world is someone going to use a device that will lose data on power loss for their write cache?? I don't think Intel is the least bit concerned. You then go on to say the 4800 is a better fit, yet you don't show any performance numbers for it because "You will notice that we do not have P4800X results. We had good results, but off of where we would expect" - so you didn't post them??? I'm sorry, that reeks of you being TOLD not to post them by Intel or your "anonymous benchmark requester" - otherwise you would've posted the numbers with the note that you think they seem off. Leaving them out entirely with no explanation as to why is extremely questionable.


Write endurance for Optane 900p 280GB 5.11 PB

Write endurance Intel S3700 200GB 10 drive writes per day for 5 years or 3.65PB for 5 years.

Write endurance Intel S3500 240GB 140 TB

So I cannot understand why you think the write endurance is lacking.


please stop trolling, the author purchased the disk immediate after the release. multiple disks were purchased from newegg, it is recorded publicly here:

https://forums.servethehome.com/index.php?threads/intel-opta...

I quote -

"Just ordered another. Sold by newegg. Limit 1 now."

the write endurance is much better than NAND products in the market. try to compare with Inte's previous generation or the current generation from SAMSUNG.

Get a real life and learn how to grow up, stop spreading completely false info. Internet is not built for you to troll.


I think you're tricking yourself into a wrong conclusion. What was promised doesn't matter (to a customer); all that matters is the price and performance.


Ignoring the difference between 3D XPoint memory and Optane-branded products based on that memory is a good way to reach bad conclusions.

But even accounting for that, Intel's definitely not hitting their goals for endurance yet, and the density goal is the only one they clearly have come close to. We won't be able to judge performance until there are 3D XPoint NVDIMMs that get the PCIe bus out of the way.


Why should it be an advertisment bought by intel? Optane disks are technically the best at the moment for ZFS SLOG but nobody had tested it in reality yet, so they just did it very good.


No matter what claims were made in the past, it is crystal clear that Optane 900P is a killer device for ZIL style access pattern. This has been verified by numerous sources, independent or sponsored. As explained in the article, the expected life span of 900p is also amazing making it suitable for enterprise use, that is another huge bonus.

What Intel offered here is a $300 device doubling your ZFS throughput performance and making the latency 10x smaller. As of writing, there is no competitor offering anything similar in that space.

Optane is still new, there are of course rooms for further improvements, I'd look at that in a very positive way - more exciting products are coming.


Perhaps you missed this part?

> The genesis of this project was that a user requested we setup a custom demo in our DemoEval lab to compare drives using this specific workload. The individual wanted to compare a few of their existing NAND solutions to Optane.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: