Hacker Newsnew | past | comments | ask | show | jobs | submitlogin
Ask HN: Secure automated backup?
62 points by jamesknelson on May 13, 2017 | hide | past | favorite | 86 comments
Hi HN,

After seeing all the news about lost data recently, I need to get my arse into gear and get an automated backup set up properly.

I'm using a Mac, so I looked into the Time Capsule. That said, if one of the data loss scenarios is a well-written ransomeware worm, it feels like the Time Capsule is going to be just as vulnerable as my main machine.

What approach would you recommend to back up data, with both hard drive failure and ransomware in mind? I'm open to cloud based solutions if that actually makes more sense.



I use borg[0] to create local space efficient encrypted backups and rclone[1] to mirror the archives to Google Drive. I wrote a short script to automate it and schedule it to run every night.

[0] https://borgbackup.readthedocs.io/en/stable/

[1] https://rclone.org


You don't have the Google drive app installed anywhere?

If you do, this setup doesn't help recovery after a cryptolocker​. The encrypted backup would also be unusable.


Currently at $WORK using Attic (which Borg was forked from), with plans to migrate to Borg.

At home, rsync to NAS and ZFS snapshots.


I'm surprised that no one mentioned Tarsnap yet, it's run by a well known HNer (cperciva): http://www.tarsnap.com/

It's not exactly noob friendly though.


The main issue with Tarsnap we had is it's very slow to restore if you have big backups. Very slow. Big in this case is in the order of a TB or so.

We had an incident were we needed to restore some data from backups recently, and it took literally days to get the files we needed back. We were not downloading the entire backup, we just wanted to restore a small subset of files.

We migrated away after that.


Can you elaborate on which part of the restore process was slow (downloading the files themselves or something else)?


Duplicity does everything that Tarsnap does, but it's free software and you can point it at whatever storage you want, including your own disks (optionally over an SSH connection), or Amazon S3.

It's really easy to set up too: https://www.grepular.com/Secure_Free_Incremental_and_Instant...


Amazingly expensive though at 25c/GB/m.

I pay Backblaze b2 about $1.60 a month for 280GB of photos. A number that doubles every few years. Today that would cost me $60 on tarsnap. That's not reasonable.


I did the same until Backblaze lost all data from one of my drives, which they didn't alert me about and I discovered abruptly attempting to restore after the physical drive died. Dealing with their customer support is very insecure (sharing your secret in plain text) and they still weren't able to recover the files. I don't see myself using or recommending it to anyone again.


I feel for you —losing backup data isn't too bad but not knowing about it is— but I could afford to keep almost 40 separate copies on Backblaze for the same price as one on Tarsnap.

I could keep one on B2 and one on S3 and one on Glacier... And still have saved $480 each year over one copy on Tarsnap.

It's too expensive.


Did you post a writeup of this somewhere online you can link to?


I wrote a draft of it but didn't publish because it just felt like a rage post. I found it difficult to separate the objective events from the frustration of data loss. That said, there probably would be value as a lot of people blindly recommend Backblaze but haven't experienced edge cases like this.


Well it's quite expensive compared to other solutions.


It really isn't. I mean it looks expensive, until you take into account all the dedup and everything it does to reduce the cost. I put in $10 dollars a few years ago, I still have almost a $4 balance ($3.91).


Well I guess it depends on what you store. I am backing up 1.5 terabyte of photos and videos which can't be compressed or dedup that much so in this case it would be more expensive that many other solutions.


It depends on what you're backing up. If you're generating a lot of unique non-dedupable assets (eg you're a photographer who keeps their raws) it quickly becomes very expensive.

Don't get me wrong, I love tarsnap and used to use it, but for a lot of workloads it does not compete on price.


How much data are you storing?


not much obviously, I am only storing critical data in tarsnap. Like others have said, if you are storing large GB's or TB's of data, rsync.net or something may be a better solution.


To concur with my sibling commenter zie (who said that tarsnap is not that expensive):

  $ tarsnap --print-stats
                         Total size     Compressed size
  All archives           1418275826434  1212480481405
    (unique data)        5342739088     2957847871
Ie, I have ~1.3TB of data stored in total but I am only being charged ($0.0239 US per day) for what amounts to ~2.75GB after deduplication and compression.

Mind you, I have not compared tarsnap's prices to those of equivalent services, but it seems pretty reasonably-priced to me.


Right, all providers are cheap if you barely use any storage.


Yeah, I am surprised at how cheap Backblaze is, according to one of the comments here.

I will say though that what I've read about tarsnap's security and my experience with it as a user makes me reluctant to want to switch. I guess I might look into it once I need to store more data.

I have had some really bad experiences with other backup systems so having something which I am quite sure is very secure and which works reliably is great. The slow restore times somebody else mentioned might be a problem though. I've only had to restore small archives/files.


1.3 TB deduplicating & compressing down to 2.75 GB seems way too good to be true. Is there something weird about your data that leads to this level of compression?


Yeah, apart from 1.6GB of documents that are mostly PDF, it's all text (mostly code). And I back up multiple times a day so there's a lot of duplicate data.


There's a common rule called the 3-2-1 rule, it states that you should:

- Have at least three copies of your data.

- Store the copies on two different media.

- Keep one backup copy offsite.

Personally, I'd recommend:

Copy 1: Your Mac.

Copy 2: A local NAS (my personal choice) or hard disk.

Copy 3: A remote backup, stored on a hard drive in a desk drawer at work, Backblaze, Google Drive, Amazon Cloud Drive or whatever other solution suits your needs.

In terms of software, I personally use rsync + ZFS/BTRFS snapshots (NAS - local, NAS2 - remote) and rclone (cloud). I haven't really used fancy solutions like Attic and Borg due to their need to write dead (i.e. not mountable without a performance penalty) data to local disk or SSH. No affordable storage that I've found offers this (rsync.net offers it but is too expensive).

It's getting to the point where I'm seriously considering buying an LTO6/7 tape drive though...

I'll also add because I haven't seen it elsewhere: verify your backups. A backup is pointless unless you know you can restore it. The best way to test this is by doing it. It should get to the point where you don't fear a restore. It shouldn't be painful. There should be no worry. It should be no more than an inconvenience. When something goes wrong, you don't want there to be even the smallest hint of doubt that there's something wrong with your process.

As such, I strongly recommend having an easily accessible backup. I'd go for a spare HDD sitting in a desk drawer at home before going for cloud backups just so that you can test it frequently.


It's also worth thinking about time to restore. If you have hundreds of GB worth of backups it could take a very long time to restore everything from the internet. Keeping an easily accessible backup around is really worth it.


I use Arq (https://www.arqbackup.com) with Amazon Drive (unlimited data for $60/year) for this


I also use Arq but send to rsync.net with reduced pricing (http://rsync.net/products/attic.html) in addition to SFTPing to a personal (offsite) server.

Additionally, I run Backblaze and use Carbon Copy Cloner roughly once a week back to clone my entire drive to an external drive.

For personal servers I use borg with the same reduced rsync.net pricing.


rsync.net is fantastic.

You don't get the read-only snapshots with the reduced borg or attic accounts though (they expect you to manage all increments using those programs).

If you are prepared to pay the standard rate, the read-only snapshots can't be destroyed by any hacker or ransomware.


In case it makes a difference for anyone: Amazon Drive does not allow commercial use. It does seem like the best deal price-wise right now.


I have a setup which works really well for my photos and videos [1][2][3][4][5]. It automatically keeps a copy of each file in 3 locations; my laptop, a Synology NAS and Google Drive / Photos.

[1] https://medium.com/@jmathai/introducing-elodie-your-personal...

[2] https://medium.com/@jmathai/understanding-my-need-for-an-aut...

[3] https://medium.com/@jmathai/my-automated-photo-workflow-usin...

[4] https://medium.com/@jmathai/one-year-of-using-an-automated-p...

[5] https://medium.com/vantage/how-to-protect-your-photos-from-b...


I used to use Crashplan which had unlimited storage and was fairly cheap(like 4$/month or something) for a family plan.

You might want to check it out. https://www.crashplan.com/en-us/features/

Also it was one of the few services that had a client that worked on Linux


I've been using Crashplan for my extended family as well, you get 10 machines with unlimited storage for something like $120 per year. Linux client has been working great, I'd definitely recommend them.


I used to use Crashplan too. The Java client brought my rMBP to a halt when it spun up though. It made the computer practically unusable.


Second CrashPlan; even allow you to encrypt with your own key (just don't lose it!) Linux client is fantastic.


"used to use" is an interesting endorsement. Why don't you use it anymore?


My plan had expired. And i ended up not renewing(since i was cutting down on all 3rd party services that i use). So i just moved to an external hard disk and "Back in Time" on linux.

But i've been looking at re-subscribing to it again.


Here are some options that I have experience with:

- Time Machine with offline disks: Since Time Machine supports multiple backup destinations, you can use a Time Capsule or hard drive that's always connected to your Mac, and also have one or more additional hard drives which you connect periodically and otherwise leave in a drawer.

Pros: Free, built into macOS, can browse file versions directly from many apps.

Cons: Needs ongoing manual intervention (i.e. plugging in the offline drives). Some reliability issues… but I've experienced the most problems backing up to my own SMB/AFP shares, so a Time Capsule might be OK.

- Backblaze (https://www.backblaze.com/) or CrashPlan (https://www.crashplan.com/): Both of these online backup services have $5/month unlimited plans, and both let you specify your own encryption key (in the form of an additional password), which isn't shared with the backup provider. Note: In my experience, Backblaze's client is much lighter on system resources/battery on Mac.

Pros: Inexpensive, off-site storage, low-maintenance.

Cons: Ongoing cost, requires trust (In theory, the client software could be sharing the encryption key with the company/the NSA/your nemesis).

- Arq (https://www.arqbackup.com/): Paid desktop software which can back up to many different destinations, including S3, Google Drive, or your own server via SFTP. You specify an encryption key for each destination.

Pros: Full control. Option to back up to another machine that you own (so no ongoing cost for hosting).

Cons: Up-front cost. Support is less straightforward than hosted solutions since Arq doesn't provide storage.


An unlisted con of Backblaze is that they delete all external drives if not plugged in for at least 6 consecutive hours every 30 days. It can be a huge pain if you travel regularly or otherwise don't want to leave your computer on all night.


Is this a new policy? Coincidentally I just restored something off a year-old backup from a dead machine's external drive, and it wasn't an issue. Maybe because the machine itself hasn't connected in a while?


The machine not connecting at all puts it in some kind of exempt from deletion state. They claim it is a six-month limit in their docs.

Note the point about having all external drives connected for that first boot or their backups will be wiped out. Very easy to shoot yourself in the foot.

https://help.backblaze.com/hc/en-us/articles/217664898-What-...


I can see how that would be a footgun. Thanks for the heads up.


Most importantly: it must be the backup server that has to log into your computer to backup, and not the other way around. That way, if your computer/server is compromised, the backups are still there. If you make the error to connect to the backup server, a hacker could also log into it and delete everything.

I my backup server uses rsnapshot and you can only log into it with ssh + key + OTP.



Arq is such a wonderful piece of software. If you go this route, I'd say to go for Amazon Cloud Drive as Arq's datastore--it's $59/year for unlimited data.



Duplicati is newer and better.


Been using this for a while now, seems to work great https://github.com/duplicati/duplicati

The features I like is * Encrypted cloud backup * Blockbased backup (only backup changes) * Restore files from a certain day


I use Time Machine, Arq and Amazon Cloud Drive:

- I have an external HDD partitioned in half: One half is for large external files that don't change much (raw files, archived data etc); and one half is a dedicated partition for Time Machine

- Time Machine backs up my laptop. If I lose my computer but not my hard drive, I can get a new one and seamlessly get the computer back to exactly how it was when I last backed it up, open tabs and all

- I also have Arq running, attached to Amazon Cloud Drive (cheapest external storage I know of). It backs up both selected portions of my laptop's disk, as well as the external hdd's non-timemachine partition (due to how TM works you can't really back it up to the cloud[1]) to "the cloud"

This leaves me with:

- Three copies of my laptop data: in the laptop, in an external hdd and in the cloud

- Two copies of larger data that can't fit, in the external hdd and in the cloud. My external HDD lives at home.

[0] Time Machine backups up once an hour, and stores backups as a simple directory structure on disk of your entire hard drive, except using hard links to old backups to avoid duplication. It keeps the last 24 hrs of hrly backups, the last 7 days of daily backups, and then weekly backups until it runs out of room.

This format simply doesn't work with the kind of backup where it scans a directory to see what's changed, because it effectively looks like you're adding hundreds of gigs of data each hour.


I second borg backup, I use it on my linux/mac machines

for windows I use reflect backup https://www.macrium.com/products/home

I tried acronis backup, but the disk restore failed, absolutely horrible software. then tried reflect disk restore was very smooth.


For local bootable backup I use Mac Backup Guru, which I also wrote: https://macdaddy.io/mac-backup-software/ It's useful because it's the only software on OS X besides Time Machine which makes versioned (incremental) backups using hardlinks.

For remote backup I use Arq, but I have found that to be very buggy. I'm considering switching to rclone: https://rclone.org/

With both of those backup solutions in place I should be ready for pretty much everything.


Can you elaborate how Arq is buggy? I'm considering switching to Arq from Crashplan, because I supply my own storage for it anyway.


I have had to restore a backup once, so I restored the most recent version, which was restoring an old version of the file. I ended up frantically restoring all the different versions of the backups, and one of them was the most recent version. But it was considered an old version by Arq, I'm not sure precisely which one, but it definitely wasn't in the top 3 most recent ones.

And more recently I made a backup to glacier, and it is trapped in "in progress", even though it appears that it may (or may not) be complete.

I haven't used it much, maybe 6 times, and 2 of the times it's had these catastrophic problems I'm talking about. I'm going to switch to another solution. I was considering rclone (like I mentioned) or Crashplan, ironically.


Crashplan worked well for me, sometimes I have to restore a file that I overwrite or that ends up being corrupted because of buggy (in-house) software and I never had any problems with that.

My main issue is that I went with free plan + my own server because transfers to their servers were really bad. But to supply your own storage, you have to install Crashplan on the server, which uses significant amount of RAM. So this rules out backuping to my NAS (not enough memory there, even though it's x86 and technically people ran Java on it).

The other issue is that file format is (AFAIK) closed, and also you an account and connection to Crashplan service, so I'm not completely sure that you really own the data. I didn't do much research here, though.


Was the Crashplan client performance better when backing up to your own server vs to theirs?


If you mean client performance in general, I never had problems with that, it's just the connection to the server overseas was not that great. I have my own dedicated server in Europe to which I have much faster connection.


I quite like Crashplan:

- very reasonably priced: I pay around £10 pcm for unltd storage for my whole family

- zero-knowledge encryption: I have the encryption keys, and everything is encrypted on my machine before its sent up

- relatively low bandwidth: only ships changed files (pretty standard tbh)

It's saved my bacon a few times, e.g. I've used it to rescue my sister's dissertation when she wiped her laptop thinking it was in Dropbox when it wasn't. I was amazed by how easy it was for me to rescue the file from the archive.


I used crash plan on Linux for years, but stopped because their Java client is a train wreck.

It would consume gigabytes of RAM and every year or so it'd meltdown when trying to install an update without using the system package manager.


This is the same reason I left Crashplan. If they could get the Java client under control it would be a worthy contender, probably even the best consumer-friendly option.


Have all my data on Dropbox with revisions activated, and having that backedup by Crashplan. You'll have double automated backups and 0 hassle managing it.


I have a related question. I want to take backup of certain folders to a portable USB HDD every night. Can anyone recommend any simple solution for that?

I don't need encryption or any extraneous features. I just need the selected directories to get mirrored to a backup location.

Currently, I am using SyncToy by Microsoft, but I was looking for a cross platform solution.



3-2-1 rule.

I would use time machine capsule and periodically (weekly?) connect an encrypted external drive and Borg backup there. Next week a second drive, third week the first one...

Always keep one of this drive off-site.

This is just one of many options how to get reasonably safe (I use an almost this one just deja-dup instead of time machine.)


I also use deja-dup but lately it seems to choke when trying to determine what to back up (I have about 175 GB of files, which doesn't seem outrageous). Have you had any issues with the speed? It could be that I have a long history saved on my backup drive and it's trying to apply too many incremental diffs.


I have no idea. Anything in logs? Maybe move backups to an other location and start a new backup repo to see what happens?


I've used the following method for years and it's really simple. Get an external hard drive and partition it as needed. One for your Time Machine backup and another for data. Use Google Drive to mirror the data and use Arq as your Time Machine in the cloud.


I don't backup end-systems -- but I do have a directory with important data sync'd to several systems and the cloud using syncthing. The rest of the data I care about is in git -- everything else on the system is basically disposable.


I think Amazon Cloud Drive is $60/yr if you have Prime. You can hook up your account to your Synology NAS and have it automatically back things up as soon as you copy it over. Also Synology can encrypt it on the fly as well.


attic <https://attic-backup.org/> encrypts data in transit via ssh, deduplicate and encrypt data at rest. I have come to appreciate both how easy it is to restore data, and the control you have over pruning which backups are kept around. You either need to be able to install attic on the remote host, or be able to mount the file system (i.e. fuse).


Do you think Google Drive (and others) are secure enough to store personal financial data like Quicken? Sometimes all you need is an email and password to get in.


You can always put extra secure documents inside password protected encrypted disk images.



I see it's already been mentioned, but allow me to second Backblaze. I use it for hundreds of clients, and they have consistently been the most reliable of the backup services I have tested*. Since they also have versioning, you can recover from CryptoLocker variants comparatively easily as well.

(Almost any cloud-based backup system can help detect Locker variants. If you notice your daily backup data set suddenly shooting up in size, time to start checking for background encryption.)

(°Backblaze, Carbonite, Crashplan, Mozy, Acronis)


What sort of upload speeds do people doing cloud backups have?

I could never contemplate uploading 100s of Mb each day with my crappy ADSL2+


I have basic cable-provided internet. 50 down, 3 up.

Initial backups take a long time, no doubt. But the daily diffs aren't so bad. As long as you use something smart enough to only upload the deltas, you're fine.

You can also snail mail drives to many cloud providers for your initial backup state if you want. I've never done so, but I've heard from friends that have that it worked out fine.


That's very helpful info, thanks!


I've used spideroak before. Zero knowledge encryption, even the NSA probably can't access it


Just stick it all in Google Drive, pay the $2/mo for 100GB or $10/mo for 1TB, done. It stores the last 100 versions of each file so ransomware shouldn't be a problem, although apparently there's no way to restore a folder at a time, you'd need to do it individually for each file...


Is there a good linux client? I couldn't get the ocaml client to work well enough.

I'd even be fine with even a one-way backup that backs up my entire ~/ (excluding some specified dirs/files) as snapshots. I don't need even need the two-way sync.


I've used rclone[1] with great success (it can sync specific folders which looks like what you want) but I've since then moved on to a commercial service called Insync[2] because I was having rate limiting problems with the Google Cloud API key that rclone needs.

I do wonder that I should move back to rclone because I don't like a third party having access to my google drive. As it goes without saying, use a good encrypted backup solution. I like cryfs for encrypting data.

[1]: https://rclone.org/ [2]: https://www.insynchq.com/ [3]: https://www.cryfs.org/


odrive has a Linux client, & supports Google Drive: https://docs.odrive.com/docs/odrive-sync-agent

Haven't actually tried it, seeing as I don't use Linux that often, but it's Python and hence reasonably cross-platform.


My solution for ransomware is to have a canary file.

Write a script that automates the backup and hardcode a known SHA digest of the file (I like to make it a photo). Before the backup happens, compare the SHA digests. If they match, go ahead with the backup.


What if the encryption has begun, but hasn't reached your canary yet?


This is a great question. It takes some time for the ransomware script to encrypt everything, so it is possible that you catch it in the middle of the encryption but not before it reaches the canary.

You could put hard links to your canary in all of the crucial locations, I suppose.

Or you could use rsnapshot on the destination side and freeze the backup schedule if something looks amiss.


For Desktop Backups, I think backblaze.com should do the Job.


Desktops not running linux.


They now offer B2, which is really awesome.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: