Improving Linux performance by preserving Buffer Cache State

binarycrusader · on July 7, 2012

Be wary; the posix interface here is completely advisory. That is, the specification doesn't require the implementing OS to actually account for the advice provided via this interface. As a result, use of this interface may not result in any actual change in system behaviour.

malkia · on July 7, 2012

Btw, on Windows if you want to purge a specific file out of the cache, then all you need to do is to reopen the file with NO_BUFFERING and/or OVERLAPPED, and close it.

This could be verified with SysInternals RamMap.

ComputerGuru · on July 7, 2012

I don't know that this accomplishes the same thing. It's not a question of getting this file/data out of the cache so much as it is about not replacing existing cache contents with this one.

Removing a file from the cache after the fact doesn't address the problem that it kicked some other data out of the cache to take its place in the first place.

malkia · on July 7, 2012

I agree. But let's say that you have a list of lots of files that are to be copied from the server to your machine. If one "frees" each file after copying, then at least the biggest harm done would be the size of the biggest file.

Now the copying itself could've been done by using NO_BUFFERING, but if it's done by program you don't have access (or it's not straightforward copying, but say rsync (DeltaCopy) or something like that).

It's not the same really as you are saying, but related somehow.

We had to do this at our studio, there was a process copying lots of fresh sound banks for the game, and it was trashing the cache which is normally filled with the game assets that are used during level building. Originally we though of directly copying files using NO_BUFFERING, but because the app was written in #C, it was a bit harder (and we didn't want to introduce insecurities). So the programmer in charge, just added one more Open/Close after the file was copied which was done with NO_BUFFERING - this purged the file from the cache.

Obviously not going to work, if instead of many sound banks, it was one huge taking all space. But since that was not the case, we took the opportunity.

ComputerGuru · on July 7, 2012

Thanks for explaining the rationale behind such a use case. Sound reasoning indeed.

paulsutter · on July 7, 2012

This is a terrific post. I've looked elsewhere for specific information on how Linux deals with posix_fadvise, and haven't found this clarity before.

cbsmith · on July 7, 2012

The short answer is "poorly". If Linus complains that Linux apps that are buffer cache sensitive end up using O_DIRECT (which in a lot of ways is worse) simply because fadvise() and similar functions have never been done properly.

paulsutter · on July 7, 2012

This post explains some ofthe subtleties that in the past had made me question whether fadvise() worked at all. Admittedly, I still find unbuffered io to be simplest and most predictable. But Linus' basic arguments against unbuffered io have been reasonable, and I feel more resolved about the matter understanding that fadvise() can be made to work.

How would you improve fadvise()? And what problems have you found with O_DIRECT?

cbsmith · on July 7, 2012

The problem with O_DIRECT is it pretty much puts each app in the business of doing its own buffer cache, which bypasses the ability of the kernel to look at the system holistically and make decisions about how to buffer data. Compound that with fairly inconsistent contracts around the interface and it's semi-synchronous behaviour... ick.

As to improve fadvise()? I'd like to see FADV_SEQUENTIAL (or perhaps a variant) not just double the read ahead buffer, but also dump pages immediately after they've been read unless there is another FD open somewhere else (you can keep calling fadvise with FADV_DONTNEED, but that's lame on several levels). I'd like to see semantics that make it clear to the kernel that data you are writing to a file (particularly if it is in append mode) likely won't be read for a very long time, so it can minimize polluting the buffer cache with freshly written data. I'd like to see fadvise() calls that specify a portion of a file only effect the portion of the file. What'd be REALLY nice would be a way to express "buffer part X and Y of the file, but if you are under pressure, dump Y before you dump X".