Backup Utility

Overview

The purpose of the backup utility is to ensure that all files created by the acquisition controller are safely transferred to redundant external drives. This includes all image and metadata files. To do so, the utility has to discover which drive are connected, mount the partitions if necessary, then examine the partition to find an identifier file. This file, a simple yaml file, shall contain the drive's name, serial number and type, i.e. ifbu, transport, archive, etc.

Mounting Partitions

Before copying the files, the utility must first find where to copy them, this is done by inspecting the kernel status files. Specifically, /proc/partitions gives a list of all the devices connected to the system and the partitions on them. The utility parses that file and finds all entries that correspond to external storage devices. We consider all devices that begin with "sd.+"1 to be external storage devices. Each device should have at least two entries, one for the device pointer itself ("sd."1) and one entry for each partition on the device ("sda\d"1).

This scheme was developed with ubuntu server in mind but the first release of Albatross will utilize ubuntu desktop, which, when logged in, already runs a service to mount new partitions as they are connected. This utility will therefore be disabled until further notice.

Discovering acceptable partitions

Once a list of partitions is created, the utility must determine which partitions are mounted and which are not. This is done by parsing the kernel status file /proc/mounts. By parsing that file, one can find the partitions that are currently mounted and their mount point. The utility then determines if there are entries in the partition list that are not in the mount list and mounts these partitions.

The whole discovery and mounting procedure runs asynchronously from the main thread. When launching the utility, a separate thread is spawned that monitors the partition and mount files and refreshes the list of found devices, mounts and unmounts partitions when devices are connected and disconnected. The DriveManager class is responsible for those functions.

Selecting Backup Locations

With the list of partitions and mount points created earlier, the utility must determine which of those locations are suitable for backup. All the heliolytics drives shall be initialised with a metadata file. This file shall contain the drive's serial number and any other identification we deem necessary, such as a Pokemon name. In addition to this name and serial, the metadata file shall contain a type identifier to help determine if this is a drive suitable for in-flight backup or transport and so on.

Reception of Telemetry

Just like the middleware, the backup utility will subscribe to the telemetry stream published by the acquisition controller. This is done so that the backup utility can discover the files it needs to copy. The telemetry structure contains a list of files organised in a map where the key is either the camera name or "metadata" and where the value is the path of the file.

When receiving this packet, the backup utility will request a list of backup locations from the modules described previously. If more than two locations are found, only the first two shall be used for the copy operation. The utility will then add a task for each file and each location to a queue. A work scheduler responsible for managing a pool of threads will then dispatch each task whenever a worker thread becomes available.

The task consists of a rclone copy command. The utility will trust that if rclone reports a success, the file has been correctly copied and the integrity of the copy has been checked against the original. When a copy operation returns successfully, a new record is added to the middleware's database, this entry will list the new file location for the image.

Disk Space Rolling Buffer

Once the images have been copied to two backup locations, it is safe to delete them from local storage. This way, local storage only acts as a high speed buffer to hold the images until they can be copied to the slower external disks. That being said, there is no need to delete them right away. Rather, the backup utility will remove the older images when disk space starts to run low. The exact limit is configurable but whatever the number is, the working principle remains the same.

The backup utility will query the middleware database for the list of images that have more than 2 file locations, more than 2 indicates that they are backed up to two external disks and present on the local drive. The images are ordered with the oldest first and the utility will get the local copy of the images. Those copies will be deleted and the file entries will be removed from the database.

Disk discovery service

Other services may require information about the backup drives currently connected to the rig. The backup utility already discovers and mount the disks and partitions, thus it makes sense to get the list of devices from this executable. To do so, the backup utility exposes a ZMQ socket on port 5557. Upon reception or a request (any string will be interpreted as a request), the backup utility will return a protobuf structure containing the list of drives connected, their paths, and other relevant information. This service is used at least by the middleware that needs to output a manifest file to all the drives connected. Other clients may appear in the future as well.

1 Follows regular expression syntax, . matches any character except line break. \d matches digits. + is the one or more multiplier.