Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

P.S. Even fragmentation can leak data. E.g. search for the most fragmented portion of the document in the DB, and you'll likely find out the most heavily edited paragraph of the document.


I believe that you raise excellent concerns. I also believe the solutions are perhaps just a bit of thoughtful UI design away.

The options could be something like:

    ----Performance & Privacy-----
    
    Document Save Mode
    
    [x] Balanced. Document history is not shared. (Default)
    [ ] Fastest. Not recommended for sensitive data 
    [ ] Maximum privacy. Slowest, but most secure

    Click for more information on what these choices mean
The first option would zero out previous edits. This would be the default. This would still be faster than re-zipping the entire archive -- you'd only be zero'ing out a single file. Metadata could still be inferred via fragmentation as you say, but I do not believe this is a concern for most use cases.

The second option would optimize purely for speed.

The third option would optimize purely for privacy. Perhaps the entire document database would be rewritten from scratch on every save. This probably wouldn't be too much slower than the current XML dance.

I understand the value of this kind of "paranoid" mode but I think it's overkill for most situations. Personally, the situations where I'd want this kind of behavior are extremely rare. Clearly some people would need/want this and that's fine; it would be only a click away. And then all future documents could inherit this choice until the app is told otherwise.


This UI solves a problem for the computer, not the user. Whenever you find yourself making a UI like this, you should go back and change your program so the computer doesn't have that problem any more. In this case, there's surely a way to write files quickly enough while still maintaining user privacy -- just do that instead of bothering me with a wall of text every time I save a file!


     just do that instead of bothering me with a wall of text every time I save a file!
I certainly wouldn't want to be hit with this wall of text every time I save a file. I was thinking that this sort of option ought to be a global setting that could perhaps be altered on a per-document basis if (and only if) the user cares to muck about with it.

It could, perhaps, be a choice that is presented to the user upon new document creation.

     This UI solves a problem for the computer, not the user. [...]
     In this case, there's surely a way to write files quickly 
     enough while still maintaining user privacy
Life is full of these choices. There's a gas pedal in my car. I can mash it (to go fast but burn a lot of fuel) or be gentle with it (and save gas, but I won't go as fast). Ideally the car should be crazy fast and crazy safe and crazy fuel efficient no matter how you drive the sucker, but that's a tough engineering challenge and ultimately we (today) leave that choice to the drive.

That said, you are 100% correct. Maybe the privacy/paranoia steps are something that could be done on a background thread. The file could be saved quickly, and then the database could be privacy-optimized in the background immediately thereafter.


That's a terrible UI. First of all, it's an issue that users wouldn't even know exists, so they'll probably never look for the options anyway. And the proposed default leaks data, a terrible choice! Secondly, why are there three options? Two of them leak data, one doesn't. Who in their right mind would pick the "leak some data" option?

Leaking sensitive data that users would expect to be deleted is a bug. People don't want their software to be full of configuration options that amount to "fix this security hole, but run really slowly" or "leave this security hole open, and run faster".


    And the proposed default leaks data
The suggested default ("don't share edit history, but don't go 'full paranoid' and vacuum/defragment the database on every save") wouldn't leak data. It would, at times, leak information from which some metadata could sometimes be gleaned. Most edits would presumably not even cause fragmentation unless they spanned more than a single data page in Sqlite's on-disk storage format.

If you counter that "metadata is data; therefore it does leak data" I suppose you'd be correct in a very strict sense. But, regardless of semantics, I think "sometimes hinting at which parts of the file have been edited" is quite a bit different from "leaking a document's edit history, verbatim" -- in much the sense that a common cold is different from HIV despite the fact that they're both viruses. (Though, I suppose they can both kill you if the conditions are right)

I can certainly think of situations where a user wouldn't want to leak such vague metadata, but that seems to me like an edge case and we as developers should strive for sensible defaults. I can't think of too many instances when I would have cared if somebody could perhaps infer which part of a document had been edited most often.

That said, yeah, offering choices like this is clunky. Maybe the privacy stuff could just happen automatically in the background following a save, or at export time, etc.


> wouldn't leak data

They will leak data, e.g. deleted information. Search “SQLite forensics” for technical details.

> Most edits would presumably not even cause fragmentation

You replace “a” with “the” in the middle of a large table, and generally the table becomes fragmented. The way it’s fragmented indicates not only where it was edited, but also what kind of edits were made, insertions and deletions fragment database differently.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: