Unless it does something very weird it won't trigger all those files to download at the same time. That shouldn't be a worry.
And, as a separate note, they shouldn't be balking at the amount of data in a virtualized onedrive or dropbox either considering the user could get a many-terabyte hard drive for significantly less money.
> Unless it does something very weird it won't trigger all those files to download at the same time. That shouldn't be a worry.
The moment you call read() (or fopen() or your favorite function), the download will be triggered. It's a hook sitting between you and the file. You can't ignore it.
The only way to bypass it is to remount it over rclone or something and use "ls" and "lsd" functions to query filenames. Otherwise it'll download, and it's how it's expected to work.
I think you might be confusing Backblaze reading files and how Dropbox/OneDrive/Nextcloud/etc. work. NC doesn't enable this by default (I don't think), but Windows calls it virtual file support. There is no avoiding filling the upload buffer, because Backblaze has zero control over how Dropbox downloads files. When Backblaze requests that a file be opened and read, Windows will ask Dropbox or whatever to open the file for it, and to read it. How that is done is up to whatever handles the virtual files. To Backblaze, your Dropbox folder is a normal directory with all that that entails, so Backblaze thinks that it can just zip through the directory and it'll read data from disk, even though that isn't really what's happening. I had to exclude my Nextcloud directory from my Duplicati backups for precisely this reason -- my Nextcloud is hosted on my server, and Duplicati was sending it so many requests it would cause my server to start sending back error 500s.
And no, my server isn't behind cloudflare, primarily because I don't have $200 to throw at them to allow me to proxy arbitrary TCP/UDP ports through their network, and I don't know how to tell CF "Hey, only proxy this traffick but let me handle everything else" (assuming that's even possible given that the usual flow is to put your entire domain behind them).
Dropbox and onedrive can handle backblaze zipping through and opening many files. The risk is getting too many gigabytes at once, but that shouldn't happen because backblaze should only open enough for immediate upload. If it does happen it's very easily fixed.
If it overloads nextcloud by hitting too many files too fast, that's a legitimate issue but it's not what OP was worried about.
The issue you’re missing is that the abstraction Dropbox/OneDrive/etc provide is not that of an NFS. When an application triggers the download of a file, it hydrates the file to the local file system and keeps it there. So if Backblaze triggers the download of a TB of files, it will consume a TB of local file system space (which may not even exist).
It does keep them permanently. Dropbox is not a NAS and does not pretend to be one.
> When you open an online-only file from the Dropbox folder on your computer, it will automatically download and become available offline. This means you’ll need to have enough hard drive space for the file to download before you can open it. You can change it back to online-only by following the instructions below.
Same exact behavior for OneDrive, though it apparently does have a Windows integration to eventually migrate unused files back to online-only if enabled.
> When you open an online-only file, it downloads to your device and becomes a locally available file. You can open a locally available file anytime, even without Internet access. If you need more space, you can change the file back to online only. Just right-click the file and select "Free up space."
Maybe it'll, maybe it won't, but it'll cycle all files in the drive and will stress everything from your cloud provider to Backblaze, incl. everything in between; software and hardware-wise.
That sounds very acceptable to get those files backed up.
It shouldn't stress things to spend a couple weeks relaying a terabyte in small chunks. The most likely strain is on my upload bandwidth and yeah that's the cost of cloud backup, more ISPs need to improve upload.
I mean, cycling a couple of terabytes of data over a 512GB drive is at least full 4 writes, which is too much for that kind of thing.
> more ISPs need to improve upload.
I was yelling the same things to the void for the longest time, then I had a brilliant idea of reading the technical specs of the technology coming to my home.
Lo and behold, the numbers I got were the technical limits of the technology that I had at home (PON for the time being), and going higher would need a very large and expensive rewiring with new hardware and technology.
4 writes out of what, 3000? For something you'll need to do once or twice ever? It's fine. You might not even eat your whole Drive Write Per Day quota for the upload duration, let alone the entire month.
> the technical limits of the technology that I had at home (PON for the time being)
Depends on your device capacity and how much is in actual use. Wear leveling things also wear things while it moves things around.
> For something you'll need to do once or twice ever?
I don't know you, but my cloud storage is living, and even if it's not living, if the software can't smartly ignore files, it'll pull everything in, compare and pass without uploading, causing churns in every backup cycle.
> Isn't that usually symmetrical? Is yours not?
GPON (Gigabit PON) is asymmetric. Theoretical limits is 2.4Gbps down, 1.2Gbps up. I have 1000Mbit/75Mbit at home.
How do you know how often those files need to be backed up without reading them? Timestamps and sizes are not reliable, only content hashes. How do you get a content hash? You read the file.
If timestamps aren’t reliable, you fall way outside the user that can trust a third party backup provider. Name a time when modification timestamp fails but a cloud provider will catch the need to download the file.
And, as a separate note, they shouldn't be balking at the amount of data in a virtualized onedrive or dropbox either considering the user could get a many-terabyte hard drive for significantly less money.