83
How to save data for archive purposes?
(lemmy.ml)
From Wikipedia, the free encyclopedia
Linux is a family of open source Unix-like operating systems based on the Linux kernel, an operating system kernel first released on September 17, 1991 by Linus Torvalds. Linux is typically packaged in a Linux distribution (or distro for short).
Distributions include the Linux kernel and supporting system software and libraries, many of which are provided by the GNU Project. Many Linux distributions use the word "Linux" in their name, but the Free Software Foundation uses the name GNU/Linux to emphasize the importance of GNU software, causing some controversy.
Community icon by Alpár-Etele Méder, licensed under CC BY 3.0
I would use maybe a Raspberry Pi or old laptop with two drives (preferably different brands/age, HDD or SSD doesn't really matter) in it using a checksumming filesystem like btrfs or ZFS so that you can do regular scrubs to verify data integrity.
Then, from that device, pull the data from your main system as needed (that way, the main system has no way of breaking into the backup device so won't be affected by ransomware), and once it's done, shut it off or even unplug it completely and store it securely, preferably in a metal box to avoid any magnetic fields from interfering with the drives. Plug it in and boot it up every now and then to perform a scrub to validate that the data is all still intact and repair the data as necessary and resilver a drive if one of them fails.
The unfortunate reality is most storage mediums will eventually fade out, so the best way to deal with that is an active system that can check data integrity and correct the files, and rewrite all the data once in a while to make sure the data is fresh and strong.
If you're really serious about that data, I would opt for both an HDD and an SSD, and have two of those systems at different locations. That way, if something shakes up the HDD and damages the platter, the SSD is probably fine, and if it's forgotten for a while maybe the SSD's memory cells will have faded but not the HDD. The strength is in the diversity of the mediums. Maybe burn a Blu-Ray as well just in case, it'll fade too but hopefully differently than an SSD or an HDD. The more copies, even partial copies, the more likely you can recover the entirety of the data, and you have the checksums to validate which blocks from which medium is correct. (Fun fact, people have been archiving LaserDiscs and repairing them by ripping the same movie from multiple identical discs, as they're unlikely to fade at exactly the same spots at the same time, so you can merge them all together and cross-reference them and usually get a near perfect rip of it).
an important detail here is to add the 2 disks to the filesystem in a way so that the second one does not extend the capacity, but adds parity. on ZFS, this can be done with a mirror vdev (simplest for this case) or a raidz1 vdev.
"The strength is in the diversity of the mediums" I like that. Should be part of the book of Zen for Backups. Thank you for your insights.