Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

You've tickled on the problem but not quite nailed it.

> You need to deal with file handles / etc, but that can be done too.

That's actually the hard part. To get a real image of that process in time, you need to snapshot the full filesystem state, too. Or it could change out from beneath your program. Even more complicated: network state.



Why is network more complicated? I would think network doesn't have any atomic/uninterruptible states filesystem might?


It's easy to re-open a file (assuming it's still there), but with sockets your IP may have changed, the remote IP may have changed (which you may have stored in your working memory that got checkpointed), DNS may point you to a different service entirely, you could have had to do some kind of port knocking or something to get that connection open in the first place.

I know this kind of stuff is being worked on so VMs/containers/namespaces can be moved around but it seems to be one of those things that gets really complicated when you try to do it transparently for userspace.


IPs, DNS settings change on running programs all the time, that doesn't seem as unusual as re-opening a file that's actually not there. A unix socket is an interesting mixed case :)


If a process has a stream socket open to another process, or to another system over the network, what happens to that socket when the process is "thawed"?

How about if it's listening on a TCP port -- what happens if that port is in use by another process when the original one is thawed?


I understand this can't go 'right', but are those things more difficult than filehandles to files that have been deleted?


Handles to deleted files are relatively uncommon in practice. Network sockets aren't.


"Handles to deleted files are relatively uncommon in practice."

Could you please expand on your reasoning here? We're talking about restoring processes at arbitrary points in the future. That means we're not just talking about handles to files that were deliberately deleted while the process was running, but also anything that the process had open that was frozen that may have been subsequently deleted. That would seem to include any log file that gets rotated, which is not exactly rare, plus a ton more things.

I also think that treating network sockets as if they were disconnected is likely to go better than treating files that way - existing programs probably make more assumptions about disk state not changing unexpectedly than about network state not changing unexpectedly (even if both are technically not well founded).


IIRC the Criu developers went into some detail about this on FLOSS weekly some time back:

https://twit.tv/shows/floss-weekly/episodes/334

I can't remember exactly where in the podcast they discussed it, but I believe it was just before the part where you could hear brains exploding in the background


Elaborate? I don't think there's much of anything that could change out from under a suspended process that couldn't change out from under a running process.

(Case in point: you can have a system hibernate, have a supposedly locked file change, and have the system resume.)




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: