# Data storage and backup
# Docker-compose
# Backup and restore in a simple installation
In the context of a simple installation, data within Ontopic Suite is stored in Docker named volumes.
For Ontopic Suite to function properly, it's essential to back up the ./env and ./default-secrets folders in addition to the named volumes. These named volumes include:
suite_docssuite_repossuite_store-dbsuite_git-configsuite_git-datasuite_git-mirror-dbsuite_identity-data
# Backup procedure with Docker Desktop
To ensure data integrity and facilitate backup, the official Docker Desktop extension known as "volumes Backup & Share" can be utilized. You can find this extension on Docker Hub at Docker Volumes Backup & Share Extension (opens new window).
This extension offers several options for exporting and importing volumes:
- Export as a compressed file on your local filesystem.
- Export to a local image.
- Export to an image in Docker Hub or another container registry.
# Backup procedure on a server setting
On a server-based setup, data backup can be achieved by following these steps:
- Halt the containers that rely on the specific volume you wish to back up. (You can use
docker compose -f docker-compose.yml -f tutorial/docker-compose-tutorial.yml down). - Create a backup of the volume's directory, which is typically located at
/var/lib/docker/volumes/<volume-name>.
Alternatively, you can use a backup script such as the one provided at GitHub (opens new window).
# Backup and restore in an advanced installation
In an advanced installation of Ontopic Suite (initiated using the init-configuration-local.sh script before launching Ontopic Suite for the first time) data from the containers is stored in bind mounts. By default, this data is saved in a folder named ./volumes, while configuration secrets are stored in ./default-secrets.
To back up the necessary data for Ontopic Suite:
- Stop Ontopic Suite. (use
docker compose -f docker-compose.yml -f tutorial/docker-compose-tutorial.yml down). - Back up the following directories:
./env,./volumes, and./default-secrets.
For restoration:
- Ensure that Ontopic Suite is not running.
- Clear the contents of the
./volumesdirectory. - Insert the backup data.
- Start Ontopic Suite using
docker compose -f docker-compose.yml -f tutorial/docker-compose-tutorial.yml up.
# Volumes and default positions
Within Ontopic Suite, various directories serve specific purposes, and these directories are defined by environment variables. These include:
STORE_SERVER_DOCS_DIR=./volumes/docs: This directory stores the LevelDB database used to support multisync and contains the actual Ontopic project data.STORE_SERVER_REPOS_DIR=./volumes/repos: This directory holds a local copy of the Git repository.STORE_SERVER_DB_DATA_DIR=./volumes/store-db: In this directory, you'll find the database containing permission settings, along with projects and their data source connections. This database can be either Postgres or SQLite, depending on your configuration.GIT_MIRROR_CONFIG_DIR=./volumes/git-config: This directory serves as the configuration folder for Gitea, used to define various settings for Gitea, such as server configuration, database connections, authentication methods, and more.GIT_MIRROR_DATA_DIR=./volumes/git-data: This directory is the data storage location for Gitea, containing various data, including Git repositories, user avatars, attachments, and other assets used by Gitea. It stores the actual repository data and user-generated content.GIT_MIRROR_DB_DATA_DIR=./volumes/git-mirror-db: This directory contains the Gitea Postgres database.IDENTITY_SERVICE_DATA_DIR=./volumes/identity-service: This directory is used for storing session data related to the identity service and authentication. It doesn't need to be backup, it is used simply as a cache.
# Manual backup of your projects
Alternatively, you can export your Ontopic Suite Projects for backup clicking on the button export project on the Dashboard. If it is necessary, you can reimport your project by creating a new project with the data source connection and clicking on the button import project on the Dashboard.
# Kubernetes
On Kubernetes, data is stored in 5 persistent volume claims:
repos-dir-ontopic-suite: Git repositories containing all project commits. Managed by the store server.docs-dir-ontopic-suite: LevelDB database used to support multisync and contains the current Ontopic project data. Managed by the store server.data-store-server-db-postgresql: Data of the internal PostgreSQL database (when not deployed externally).runtime-configuration-ontopic-server: Configuration files for deploying SPARQL and semantic SQL endpoints and running materialization jobs.endpoint-security-dir-ontopic-server: CA certificate and hashed personal access tokens for accessing semantic SQL endpoints through the PostgreSQL wire protocol.
We recommend using Velero (opens new window) for backing up and restoring Kubernetes cluster resources and persistent volumes. The following section demonstrates how to configure and use it with Azure as a cloud platform.
# Velero on Azure
Velero backs up both Kubernetes resources (Secrets, ConfigMaps, StatefulSets, Services, etc.) and persistent volume data via Azure Disk Snapshots.
Azure Disk Snapshots capture a point-in-time copy of the managed disk. PostgreSQL uses Write-Ahead Logging (WAL) with fsync=on and full_page_writes=on by default, so it can recover to a consistent state from a disk snapshot, just as it would after a power failure.
# Requirements
In additional of the classical tools kubectl (opens new window), helm (opens new window) and Azure CLI (opens new window), you need to install the Velero CLI (opens new window).
Your AKS cluster must have workload identity (opens new window) enabled:
az aks update \
--resource-group <aks-resource-group> \
--name <aks-cluster-name> \
--enable-oidc-issuer \
--enable-workload-identity
# Setup
# 1. Set environment variables
# Your AKS cluster (the resource group where the AKS resource was created, not the MC_ one)
AKS_RESOURCE_GROUP=<aks-resource-group>
AKS_CLUSTER_NAME=<aks-cluster-name>
AZURE_SUBSCRIPTION_ID=$(az account show --query id -o tsv)
AZURE_TENANT_ID=$(az account show --query tenantId -o tsv)
# Backup storage (choose a globally unique storage account name)
VELERO_RESOURCE_GROUP=Velero_Backups
VELERO_STORAGE_ACCOUNT=veleroontopic
VELERO_BLOB_CONTAINER=velero
VELERO_LOCATION=westeurope
# Derived values (AKS_NODE_RG is the MC_ resource group where disks are stored)
AKS_NODE_RG=$(az aks show \
-g $AKS_RESOURCE_GROUP \
-n $AKS_CLUSTER_NAME \
--query nodeResourceGroup -o tsv)
AKS_OIDC_ISSUER=$(az aks show \
-g $AKS_RESOURCE_GROUP \
-n $AKS_CLUSTER_NAME \
--query oidcIssuerProfile.issuerUrl -o tsv)
In the following, we are assuming the namespace where Ontopic Suite is installed is ontopic.
# 2. Create Azure storage for backups
az group create \
--name $VELERO_RESOURCE_GROUP \
--location $VELERO_LOCATION
az storage account create \
--name $VELERO_STORAGE_ACCOUNT \
--resource-group $VELERO_RESOURCE_GROUP \
--sku Standard_GRS \
--encryption-services blob \
--https-only true \
--kind BlobStorage \
--access-tier Hot
az storage container create \
--name $VELERO_BLOB_CONTAINER \
--public-access off \
--account-name $VELERO_STORAGE_ACCOUNT
# 3. Create a managed identity for Velero
VELERO_IDENTITY_NAME=velero-identity
az identity create \
--name $VELERO_IDENTITY_NAME \
--resource-group $AKS_RESOURCE_GROUP
VELERO_IDENTITY_CLIENT_ID=$(az identity show \
-g $AKS_RESOURCE_GROUP \
-n $VELERO_IDENTITY_NAME \
--query clientId -o tsv)
# 4. Assign roles
Velero needs access to the backup storage account and to the resource group where AKS disks are stored:
# Storage Blob Data Contributor — read/write backups
STORAGE_ACCOUNT_RESOURCE_ID=$(az storage account show \
--name $VELERO_STORAGE_ACCOUNT \
--query id -o tsv)
az role assignment create \
--assignee $VELERO_IDENTITY_CLIENT_ID \
--role "Storage Blob Data Contributor" \
--scope $STORAGE_ACCOUNT_RESOURCE_ID
# Contributor on the node resource group — create/manage disk snapshots
az role assignment create \
--assignee $VELERO_IDENTITY_CLIENT_ID \
--role "Contributor" \
--scope /subscriptions/$AZURE_SUBSCRIPTION_ID/resourceGroups/$AKS_NODE_RG
Note: Azure role assignments can take up to 10 minutes to propagate. If the backup storage location shows
Unavailableright after setup, wait a few minutes and check again.
# 5. Create a federated credential
This allows the Velero service account in Kubernetes to authenticate as the managed identity:
az identity federated-credential create \
--name velero-federated-credential \
--identity-name $VELERO_IDENTITY_NAME \
--resource-group $AKS_RESOURCE_GROUP \
--issuer $AKS_OIDC_ISSUER \
--subject system:serviceaccount:velero:velero-server \
--audience api://AzureADTokenExchange
# 6. Create the credentials secret
With workload identity, the credentials file does not contain passwords or keys. It only tells the Azure plugin which subscription, tenant, and identity to use:
kubectl create namespace velero
cat <<EOF | kubectl create secret generic velero-credentials \
--namespace velero \
--from-file=cloud=/dev/stdin
AZURE_SUBSCRIPTION_ID=$AZURE_SUBSCRIPTION_ID
AZURE_TENANT_ID=$AZURE_TENANT_ID
AZURE_CLIENT_ID=$VELERO_IDENTITY_CLIENT_ID
AZURE_RESOURCE_GROUP=$AKS_NODE_RG
AZURE_FEDERATED_TOKEN_FILE=/var/run/secrets/azure/tokens/azure-identity-token
AZURE_AUTHORITY_HOST=https://login.microsoftonline.com/
EOF
# 7. Install Velero
helm repo add vmware-tanzu https://vmware-tanzu.github.io/helm-charts
helm repo update
helm install velero vmware-tanzu/velero \
--version 11.3.2 \
--namespace velero \
--set credentials.existingSecret=velero-credentials \
--set configuration.backupStorageLocation[0].name=default \
--set configuration.backupStorageLocation[0].provider=velero.io/azure \
--set configuration.backupStorageLocation[0].bucket=$VELERO_BLOB_CONTAINER \
--set configuration.backupStorageLocation[0].config.resourceGroup=$VELERO_RESOURCE_GROUP \
--set configuration.backupStorageLocation[0].config.storageAccount=$VELERO_STORAGE_ACCOUNT \
--set configuration.backupStorageLocation[0].config.subscriptionId=$AZURE_SUBSCRIPTION_ID \
--set-string configuration.backupStorageLocation[0].config.useAAD=true \
--set configuration.volumeSnapshotLocation[0].name=default \
--set configuration.volumeSnapshotLocation[0].provider=velero.io/azure \
--set configuration.volumeSnapshotLocation[0].config.subscriptionId=$AZURE_SUBSCRIPTION_ID \
--set configuration.volumeSnapshotLocation[0].config.resourceGroup=$AKS_NODE_RG \
--set initContainers[0].name=velero-plugin-for-microsoft-azure \
--set initContainers[0].image=velero/velero-plugin-for-microsoft-azure:v1.13.2 \
--set initContainers[0].volumeMounts[0].mountPath=/target \
--set initContainers[0].volumeMounts[0].name=plugins \
--set serviceAccount.server.annotations."azure\.workload\.identity/client-id"=$VELERO_IDENTITY_CLIENT_ID \
--set-string podLabels."azure\.workload\.identity/use"=true
# 8. Verify the installation
kubectl get pods -n velero
velero backup-location get
The backup storage location should show Available.
# Creating backups
# Scheduled backups
Create a daily backup of the Ontopic namespace, retained for 30 days:
velero schedule create ontopic-daily \
--schedule="0 2 * * *" \
--include-namespaces ontopic \
--ttl 720h
# On-demand backups
Create a one-off backup before an upgrade or maintenance:
velero backup create pre-upgrade-$(date +%Y%m%d) \
--include-namespaces ontopic \
--wait
# Check backup status
velero backup get
velero backup describe <backup-name> --details
velero backup logs <backup-name>
# Restoring
# Restore into a new namespace
This is the safest approach — it avoids conflicts with existing resources:
velero restore create --from-backup <backup-name> \
--namespace-mappings <original-namespace>:<new-namespace>
# In-place restore
By default, Velero skips resources that already exist. To restore into the same namespace, uninstall the Helm release first, delete the PVCs, then restore:
# Remove all resources managed by the Helm release
helm uninstall ontopic-suite -n ontopic
# Delete the PVCs (retained by helm uninstall, will be restored from backup snapshots)
kubectl delete pvc -n ontopic --all
# Restore
velero restore create --from-backup <backup-name>
# Restore specific resources only
velero restore create --from-backup <backup-name> \
--include-resources persistentvolumeclaims,persistentvolumes
# Check restore status
velero restore get
velero restore describe <restore-name> --details
# Post-restore steps
After a restore, PostgreSQL will automatically replay its WAL to reach a consistent state. No manual intervention is required.
If you restored into a new namespace or a new cluster, you may need to re-create resources that live outside the namespace.
# AWS Marketplace
Create a snapshot of the second volume (/dev/xvdba).