Database Backup & Restore Runbook
Overview
Agentix uses Railway PostgreSQL as its primary data store. Railway Pro plan provides automatic daily backups with 7-day retention and point-in-time recovery. This runbook documents how to verify backups are enabled, perform manual backups, and restore from both Railway UI andpg_restore.
All persistent data lives in PostgreSQL: tenants, users, workflows, workflow versions, contacts, conversations, messages, events, runs, steps, tools, credentials, and audit logs. Data loss here means loss of customer data and workflow definitions.
Environment variables:
DATABASE_URL— connection string for the primary PostgreSQL instance
1. Verifying Backups Are Enabled
Perform this check monthly or after any Railway infrastructure change.- Open the Railway Dashboard.
- Navigate to your project > PostgreSQL service.
- Click the Backups tab.
- Confirm the following:
- Automatic Backups is enabled (enabled by default on Pro plan).
- Retention period shows 7 days.
- Last backup timestamp is within the last 24 hours.
- If automatic backups are disabled:
- Click Enable Backups (Pro plan required).
- If on a free plan, upgrade to Pro or implement the manual backup procedure below as a workaround.
2. Manual Backup (pg_dump)
Use this for additional safety before major migrations, schema changes, or deployment of breaking changes.Prerequisites
pg_dumpinstalled locally (ships with PostgreSQL client tools)DATABASE_URLfrom Railway environment variables
Procedure
-
Get the connection string:
-
Run the backup:
-Fc= custom format (compressed, supports selective restore)- Output file example:
agentix_backup_20260326_143000.dump
-
Verify the dump file:
Should show table entries (tenants, users, workflows, etc.).
-
Store the backup off-site:
Recommended Schedule
| Timing | Trigger |
|---|---|
| Before any Prisma migration | Manual |
| Before major deploys | Manual |
| Weekly (critical periods) | Cron job |
3. Restore Procedure (Railway UI)
Use this when Railway automatic backups are available and you need a full restore.Procedure
- Navigate to Railway Dashboard > Project > PostgreSQL service.
- Click the Backups tab.
- Select the backup by timestamp (choose the most recent backup before the incident).
- Click Restore.
- Railway creates a new database volume with the restored data.
- The original database is preserved (not overwritten).
- Update
DATABASE_URLin your Railway project environment variables:- Go to API service > Variables tab.
- Update
DATABASE_URLto the new connection string from the restored PostgreSQL instance.
- Redeploy the API service:
- Verify the restore:
Expected response:
Rollback
If the restored database has issues:- Revert
DATABASE_URLto the original connection string. - Redeploy the API service.
- Investigate the restore issue before retrying.
4. Restore Procedure (pg_restore from dump)
Use this when restoring from a manualpg_dump backup, or when Railway UI restore is unavailable.
Procedure
-
Create a new PostgreSQL service in Railway (or use an existing empty instance):
- Railway Dashboard > Project > New > Database > PostgreSQL.
- Copy the new
DATABASE_URLfrom the Variables tab.
-
Restore the dump:
--cleandrops existing objects before recreating.--if-existsprevents errors if objects do not exist yet.
-
Run Prisma migrations to ensure schema is up to date:
-
Verify table counts match expected values:
-
Update
DATABASE_URLon the API service to point to the restored instance. -
Redeploy and verify health:
5. Post-Restore Checklist
After any restore (Railway UI or pg_restore), verify:- Health endpoint returns
{"status":"ok","checks":{"db":"ok","redis":"ok"}} - Recent workflow runs are present (check
/api/runsor Railway logs) - Tenant data integrity: spot-check 2-3 tenants via API or Prisma Studio
- Published workflows load correctly (check
workflow_versionstable has entries) - Contact tags and groups are intact
- BullMQ workers are processing jobs (check Railway API service logs)
- Monitor Sentry for errors in the first 30 minutes post-restore
- Verify webhook processing is working (send a test WhatsApp message)
6. Testing Schedule
| Frequency | Activity | Purpose |
|---|---|---|
| Monthly | Verify Railway backup tab (step 1) | Confirm backups are running |
| Quarterly | Full restore test to staging database | Validate restore procedure works end-to-end |
| Before major migrations | Manual pg_dump | Safety net for schema changes |
Quarterly Restore Test Procedure
- Create a temporary PostgreSQL instance in Railway.
- Restore the latest automatic backup (Railway UI) or most recent manual dump.
- Point a local API instance at the restored database.
- Run the health check and spot-check 2-3 tenants.
- Delete the temporary PostgreSQL instance.
- Document the test result and date in this section:
| Date | Method | Result | Notes |
|---|---|---|---|
| YYYY-MM-DD | Railway UI / pg_restore | Pass / Fail | Notes |
7. Incident Response
If data loss is detected:- Do not restart or redeploy — this may overwrite recovery options.
- Determine the scope: which tables/rows are affected?
- Check Railway backup tab for the most recent clean backup.
- If Railway backup is available: follow Restore Procedure (Railway UI).
- If manual dump is more recent: follow Restore Procedure (pg_restore).
- If partial data loss: consider point-in-time queries or selective pg_restore.
- After restore: run the Post-Restore Checklist.
- Post-incident: document root cause and update this runbook if needed.