Skip to content

Instance Backups

At the moment, we are supporting only the AWS deployments. Therefore, we are going to use the AWS backup service, as described in this document (the link is private).

AWS

RDS

RDS backups are on by default. We are already allowing the configuration of the retention period, but we should make it default to the max value (35).

MongoDB Atlas

We are already supporting cloud backups with MongoDB Atlas in our Terraform scripts. In addition to this, we should also set up the backup schedule and keep its restore window in sync with the retention period from AWS RDS.

S3

We have already done a discovery about this over 3 years ago - BB-182 (private link). The results are described in this discovery document. We have decided to use the bucket versioning for S3 backups. The approach hasn't changed too much since then, and we have been using it with Ocim for almost 3 years. This approach protects the data only from accidental deletion and modification.

How the bucket versioning works

  • When we add an object (via PUT, POST, or COPY), S3 adds a unique ID that determines its version:
    S3 versioning PUT request
  • When we GET the object with the specific key, the one with the most recent version is returned:
    S3 versioning GET request
  • When we DELETE an object, a "Delete Marker" is added as the current version of that object; the previous version becomes the non-current version:
    S3 versioning DELETE request
  • "Delete Markers" are placeholders with their own Key and ID. If we DELETE an object that has the "Delete Marker" on its "top", a new marker will be added:
    S3 versioning double DELETE

How to restore overwritten objects

  • COPY (duplicate) the object:
    S3 versioning restore COPY
  • DELETE all newer versions (or "Delete Markers", but they can be deleted only by the bucket owner):
    S3 versioning restore DELETE

For now, we are going to use the following lifecycle rules for data cleanup (we will set them up with Terraform):

<LifecycleConfiguration>
    <Rule>
       <ID>Clean up</ID>
        <Filter>
          <Prefix></Prefix>
        </Filter>
        <Status>Enabled</Status>
        <Expiration>
           <ExpiredObjectDeleteMarker>true</ExpiredObjectDeleteMarker>
        </Expiration>
        <NoncurrentVersionExpiration>
            <NoncurrentDays>30</NoncurrentDays>
        </NoncurrentVersionExpiration>
    </Rule>
</LifecycleConfiguration>

If we want archive old file versions, we can set up AWS Glacier transitions with the following rules. However, we need to remember that this is not going ot be a full backup, but only an archive of old file versions.

<LifecycleConfiguration>
   <Rule>
       <ID>Archive and clean up</ID>
       <Filter>
          <Prefix></Prefix>
       </Filter>
       <Status>Enabled</Status>
       <Expiration>
           <ExpiredObjectDeleteMarker>true</ExpiredObjectDeleteMarker>
        </Expiration>
       <NoncurrentVersionTransition>
           <NoncurrentDays>30</NoncurrentDays>     
           <StorageClass>STANDARD_IA</StorageClass>  
       </NoncurrentVersionTransition>
       <NoncurrentVersionTransition>
           <NoncurrentDays>60</NoncurrentDays>     
           <StorageClass>GLACIER</StorageClass>  
       </NoncurrentVersionTransition>
       <NoncurrentVersionExpiration>
           <NoncurrentDays>90</NoncurrentDays>
       </NoncurrentVersionExpiration>
   </Rule>
</LifecycleConfiguration>

Potential future improvements

Amazon has recently announced an alternative way of backing up S3 data. AWS Backup is a centralized backup solution. It already supports RDS through its native database snapshots. However, the S3 support is currently in the Preview version and works in the US West region. Therefore, it doesn't make sense to look into this now, but it could be a way of making full backups of S3 buckets in the future.

DigitalOcean

Spaces

DigitalOcean's Spaces support lifecycle rules, just like AWS S3 buckets. We can reuse the proposed configuration here.

Note: Spaces do not support version-related transitions at the moment, so archiving old versions is not possible.

MySQL

Managed MySQL instances are backed up daily, with 7-day retention. This is not configurable at the moment.

MongoDB

Managed MongoDB instances are also backed up daily, with 7-day retention. This is not configurable at the moment.