Scripting Backups with bash on Linux

Over the years, Linux has gone from being something exclusively for the hardcore tech-savvy to something accessible to all. Modern GUIs provide a friendly means for anybody to interact with the operating system, without needing to use the terminal.

However, the terminal remains one of the areas where Linux shines, and is a great way for power users to automate routine tasks. Taking backups is one such mundane and repetitive activity which can very easily be scripted using shell scripts. In this article, we’ll see how to do this using the bash shell, which is found in the more popular distributions such as Ubuntu.

Simple Folder Backup with Timestamp

Picture this: we have a couple of documents in our Documents folder, and we’d like to back up the entire folder and append a timestamp in the name. The output filename would look something like:

Documents-2020.01.21-17.25.tar.gz

The .tar.gz format is a common way to compress a collection of files on Linux. gzip (the .gz part) is a powerful compression algorithm, but it can only compress a single file, and not an entire folder. Therefore, a folder is typically packaged into a single .tar file (historically standing for “tape archive”). This tarball, as it’s sometimes called, is then compressed using gzip to produce the resulting .tar.gz file.

The timestamp format above is something arbitrary, but I’ve been using this myself for many years and it has a number of advantages:

  1. It loosely follows the ISO 8601 format, starting with the year, and thus avoiding confusion between British (DD/MM) and American (MM/DD) date notation.
  2. For the same reason, it allows easy time-based sorting of backup files within a directory.
  3. The use of dots and dashes comes in handy when the filename is too long and is wrapped onto multiple lines. Related parts (e.g. the year, month and day of the date) stick together, but separate parts (e.g. the date and time) can be broken onto separate lines. Also, this takes up less (screen) space than other methods (e.g. using underscores).

To generate such a timestamp, we can use the Linux date command:

$ date
Tue 21 Jan 2020 05:35:34 PM UTC

That gives us the date, but it’s not in the format we want. For that, we need to give it a parameter with the format string:

$ date +"%Y.%m.%d-%H.%M.%S"
2020.01.21-17.37.14

Much better! We’ll use this shortly.

Let’s create a backups folder in the home directory as a destination folder for our backups:

$ mkdir ~/backups

We can then use the tar command to bundle our Documents folder into a single file (-c is to create, and -f is for file archive):

$ tar -cf ~/backups/Documents.tar ~/Documents

This works well as long as you run it from the home directory. But if you run it from somewhere else, you will see an unexpected structure within the resulting .tar file:

Instead of putting the Documents folder at the top level of the .tar file, it seems to have replicated the folder structure leading up to it from the root folder. We can get around this problem by using the -C switch and specifying that it should run from the home folder:

$ tar -C ~ -cf ~/backups/Documents.tar Documents

This solves the problem, as you can see:

Now that we’ve packaged the Documents folder into a .tar file, let’s compress it. We could do this using the gzip command, but as it turns out, the tar command itself has a -z switch that produces a gzipped tarball. After adding that and a -v (verbose) switch, and changing the extension, we end up with:

$ tar -C ~ -zcvf ~/backups/Documents.tar.gz Documents
Documents/
Documents/some-document.odt
Documents/tasks.txt

All we have left is to add a timestamp to the filename, so that we can distinguish between different backup files and easily identify when those backups were taken.

We have already seen how we can produce a timestamp separately using the date command. Fortunately, bash supports something called command substitution, which allows you to embed the output of a command (in this case date) in another command (in this case tar). It’s done by enclosing the first command between $( and ):

$ tar -C ~ -zcvf ~/backups/Documents-$(date +"%Y.%m.%d-%H.%M.%S").tar.gz Documents

As you can see, this produces the intended effect:

Adding the Hostname

If you have multiple computers, you might want to back up the same folder (e.g. Desktop) on all of them, but still be able to distinguish between them. For this we can again use command substitution to insert the hostname in the output filename. This can be done either with the hostname command or using uname -n:

$ tar -C ~ -zcvf ~/backups/Desktop-$(uname -n)-$(date +"%Y.%m.%d-%H.%M.%S").tar.gz Desktop

Creating Shell Scripts

Although the backup commands we have seen are one-liners, chances are that you don’t want to type them in again every time you want to take a backup. For this, you can create a file with a .sh extension (e.g. backup-desktop.sh) and put the command in there.

However, if you try to run it (always preceded by a ./), it doesn’t actually work:

The directory listing in the above screenshot shows why this is happening. The script is missing execution permissions, which we can add using the chmod command:

$ chmod 755 backup-desktop.sh

In short, that 755 value is a representation of permissions on a file which grants execution rights to everybody (see Chmod Calculator or run man chmod for more information on how this works).

A directory listing now shows that the file has execution (x) permissions, and the terminal actually highlights the file in bold green to show that it is executable:

Running Multiple Scripts

As we start to back up more folders, we gradually end up with more of these tar commands. While we could put them one after another in the same file, this would mean that we always backup everything, and we thus lose the flexibility to back up individual folders independently (handy when the folders are large and backups take a while).

For this, it’s useful to keep separate scripts for each folder, but then have one script to run them all (see what I did there?).

There are many ways to run multiple commands in a bash script, including simply putting them on separate lines one after another, or using one of several operators designed to allow multiple commands to be run on the same line.

While the choice of which to use is mostly arbitrary, there are subtle differences in how they work. I tend to favour the && operator because it will not carry out further commands if one has failed, reducing the risk that a failure goes unnoticed:

$ ./backup-desktop.sh && ./backup-documents.sh

Putting this in a file called backup-all.sh and executing it has the effect of backing up both the Desktop and the Documents folders. If I only want to backup one of them, I can still do that using the corresponding shell script.

Conclusion

We have thus far seen how to:

  • Package a folder in a compressed file
  • Include a timestamp in the filename
  • Include the hostname in the filename
  • Create bash scripts for these commands
  • Combine multiple shell scripts into a single one

This helps greatly in automating the execution of backups, but you can always take it further. Here are some ideas:

  • Use cron to take periodic backups (e.g. daily or weekly)
  • Transfer the backup files off your computer – this really depends what you are comfortable with, as it could be an external hard drive, or another Linux machine (e.g. using scp), or a cloud service like AWS S3.

Leave a Reply

Your email address will not be published. Required fields are marked *