By Damiano Verzulli – GARRLab founder
In our daily life, defining and implementing proper “data protection” is always a priority.
It doesn’t really matter whether we’re talking about the photo taken with our smartphone during for our child’s latest birthday or the draft of that research paper we’re working on with colleagues or, also, the whole archive of our company accounting records: all kind of data is at risk, and we need to protect all of it!
With an always-increasing number of ‘connected’ devices, the main risk we face nowadays comes exactly from the internet: ransomware.
I’m not going to discuss ransomware in detail, as this is a topic widely covered by other GÉANT “Cyber Hero @ Home” articles. So let’s focus on the impact of ransomware on our precious data. What happens when our device (personal computer, notebook, smartphone, etc.) get infected by ransomware? In short:
- all the data stored in the device is encrypted and the only technical way to decrypt it is via a decryption-key known only to the attacker;
- if additional devices (USB external disk, USB flash drive) are connected to our main PC, the data within such devices is encrypted as well;
- if our PC is connected to a LAN and has write-access to some shared-folder, the data within such shared-folders is encrypted as well;
- depending on the strain of the ransomware, various additional attacks are carried out, trying to get as much “access” to resources as possible (harvesting for passwords, e-mail archives and databases; etc.)
Back to our data-protection plan, it should be noted from point 2) that holding an external USB disk connected to our computer with some software taking care of a daily-copy of our data… is not enough: ransomware will encrypt also the backup!
The same problem applies also to backups sent to shared folders on a company server: ransomware will easily get access to those shared folders and proceed encrypting everything it finds!
Relying on network resources like a company file server to store our data poses additional risks: as the resources are shared among multiple users, a successful infection suffered by a single user can easily impact all the other users sharing the same folders.
So, what can we do to protect our data?
In order to try answering such a complex question we need to focus on some additional sub-questions:
- How long will I keep my backup ready for a potential restore?
- How much data am I ready to lose, should I be forced to restore my data from a backup?
- Should I need a restore, how much time am I ready to wait before data will be back thanks to the restore?
Answers to these three questions heavily depend on the context of our data: the photos kept in our smartphone can be easily kept for years (a) and we usually accept to lose the photos we took yesterday, or last week when the restore gives us back the last 5 years stock of photos (b). Also, we can easily accept a four weeks long restore process, should we need it (c).
Things changes a lot when we deal with our company accounting records: we can easily keep them for years as well (a) but we really want to lose the minimun possible amount of data (as every missing data needs to be re-entered in the system, once it’s back again) (b). Also, the company needs to be up-and-running as fast as possible, so the restore procedure needs to be really fast (in the order of minutes or hours, as a maximun) (c).
Within the technical communities terms like:
- Retention (how long will I keep the copy?)
- RTO – Recovery Time Objective (after a data-loss, how much time do I need to wait before having my data back, once restored from the backup?)
- RPO – Recovery Point Objective (after a data-loss, at the moment I’ll be given my data back thanks to a successful restore, how “old” will the data be?)
These terms are often used to quickly identify the main characteristics of the data-protection plan that needs to be put in place. And even though we’re not data-protection specialists, I really think it’s useful to classify our own data assigning to it our own Retention/RTO/RPO metrics.
An additional golden rule relates to the physical place where we’re going to host our backup. As already mentioned in 1), 2) and 3), keeping the backup inside a dedicated folder of our computer (be it within the internal hard-drive, an external USB storage unit or a network shared-folder) is definitely not a wise decision. An “off-site” backup is the way to go: a backup hosted in a place that is not reachable by our computer.
In our private world, an “off-site” backup can easily be done on an external-drive physically kept at home, but disconnected from the main PC and connected to it only during the backup activities.
In a company, an “off-site” backup is typically stored in a building that it’s not the same as the one hosting the real/production systems and data.
As you might guess, physically moving the backup storage here and there quickly becomes a little burden, with the consequence that the chances of a missing backup becomes a reality. What are the alternatives?
The internet provides a potential solution: cloud-storage. Especially when dealing with the smartphone ecosystem, cloud-storage can easily be chosen as the place holding the main data and/or related backup. It should be noted that even though a ransomware attack targeting the smartphone can easily take control of the cloud-storage encrypting the whole content, it is common practice for cloud-providers to offer solutions being able to track modification to files (“versioning”) and, as such, to let the user being able to “rollback” unexpected changes. Here, the invitation is to carefully review the feature-set offered by the cloud-provider and check if it effectively provides such a “versioning” feature.
Cloud-storage can be an option even for common PC/notebook, in both personal and business environment. Indeed, business environments demands for additional requirements, like GDPR compliance as well as higher level of service-availability and security-policies, whose impact should be carefully reviewed.
When the amount of data to protect starts exceeding a significant threshold (in the range of ~10 TB), the time required to physically take a full-copy of the whole set quickly becomes critical: what happens when our “backup” takes ~20 hours to be written to a tape? And what will happens when it takes more than 24 hours? As a single restore of such a wide set of data will easily take more than one day, are we going to accept an RTO higher than one day (the day required to restore data), and an RPO higher than two days (the day required by the backup, plus the day required by the following restore)?
To mitigate these problems, faster & larger backup-storages can be employed (scale up) as well as increasing the number of concurrent backup engines (scale out). But despite the efforts spent, it’s becoming really easy to quickly reach a point where classical backup infrastructures simply cannot handle the huge amount of data to protect –from one side– and the business requirements (retention, RTO, RPO) from the other side. Before reaching such a critical point, it’s definitely wise to perform a detailed analysis of the current backup infrastructure, by performing some real testing of a restore scenario.
A different approach in addressing the complexities of backup solutions comes from software-based infrastructures. The wide adoption of virtualisation technologies and, more recently, the so-called SDS – Software Defined Storage platforms started offering a completely new approach to data-protection. Those storage systems:
- are able to replicate and distribute data, on-the-fly, to multiple geographical locations;
- are able to track changes to data, and focus the replication activities only to the changed data;
- are able to perform “snapshots”, holding a frozen virtual-copy of data, without impacting the common usage of the storage system;
- are able to perform complex on-the-fly compression, encryption and data-de-duplication activities to heavily reduce the amount of data to transfer outside the system.
More and more companies are embracing these technologies, thus shifting their approach from the classical backup towards a more distributed/replicated-storage infrastructure.
Moreover, these technologies are currently available also with open-source licenses in both Linux and other Unix-based operating systems. Current releases of OpenZFS and BTRFS provide really advanced storage features, including encryption, compression, snapshotting and snapshot-stream-replication.
So, to recap:
- to backup your private data, in your smarphone and/or on your PC or notebook, consider using some cloud-storage provider but, please, ensure that it offers some “versioning” feature, granting you the possibility to “rollback” to previous versions, to be protected should some ransomware encrypt the whole content. Alternatively, feel free to rely on external devices (like USB sticks) but, please, keep them physically disconnected, and connect them only when needed. Should you have lots of data to backup, consider adopting higher-level storage devices providing their own backup software suite;
- in a small office, where the amount of data is relatively small (hundreds of gigabytes), try to centralise all data within a file-server and focus on it. Adopt some backup-specific software (in order NOT to rely only on shared-folders) hosting back-upped-data outside the server. Take a look to Bareos, an enterprise grade, open-source, backup solution;
- in larger environments, think carefully in terms of retention, RTO and RPO. Try to understand if you’re close to the physical limit of your current backup infrastructure (probably centred around tape-drives and physical tapes). Consider shifting your storage to an SDS-based platform, offering advanced data-management (encryption, compression, deduplication) and replication features. Again, have a look at Bareos, as it can really help.
Regardless of your context, keep in mind that “backup” is a love or hate thing, with nothing in the middle: you’ll deeply love it, if it will give you back your data, when you need it; or you’ll deeply hate it, ‘cause it may give you nothing, when you were expecting your data instead!
So, take your time, and start (re)planning your backup/data-protection strategy!
About the author
With a Computer Science degree, in 1996 Damiano joined CINECA as Internet Developer. From 1999-2002, he lead the “Web Development” team of Nextra, a Norway-based corporation approaching the European ISP market. From 2003 to 2021 he was the Access-Port-Manager (APM) for the University “G. D’Annunzio”, supporting operations of local infrastructure and data centre. In 2019 Damiano founded the GARRLab community and in 2021 started collaborating with GARR on cybersecurity. Very technical and Open Source addicted, Damiano tries to stay aligned with current internet technologies developments.
Also this year GÉANT joins the European Cyber Security Month, with the 'Cyber Hero @ Home' campaign. Read articles from cyber security experts within our community and download resources from our awareness package on https://dev.connect.geant.org/csm2021