qvest-task/docs/diagrams/update-workflow.md
gitea_admin 685de1816d feat: implement update automation and backup system with CI tests (#1)
- Diun monitors Docker images
- Automated updates for nginx, manual approval for gitea/postgres
- Weekly cert renewal automation via cron
- Health checks with automatic rollback on failure
- AWS SES email notifications on update failures
- Daily S3 backups + pre-update snapshots
- Integration tests with Gitea Actions quality gate
- Change domain from gitea.poll-streams.com to git.poll-streams.com
- Add diagrams
2026-06-11 15:51:48 +00:00

170 lines
7.9 KiB
Markdown
Raw Blame History

This file contains invisible Unicode characters

This file contains invisible Unicode characters that are indistinguishable to humans but may be processed differently by a computer. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

# Update Workflow
This diagram shows the complete automated update workflow for the Gitea deployment, including update detection, automatic and manual update paths, rollback procedures, and certificate renewal.
## Overview
- **Diun** monitors for container updates weekly (Sunday 3:00 AM)
- **Automatic updates** for low-risk containers (nginx)
- **Manual approval** required for critical containers (gitea, postgres)
- **Backup before update** with automatic rollback on failure
- **Certificate renewal** runs separately (Sunday 3:30 AM)
- **Email notifications** for all significant events
## Update Workflow Diagram
```mermaid
%%{init: {'theme':'base', 'themeVariables': { 'primaryColor':'#e5e7eb','primaryTextColor':'#111827','primaryBorderColor':'#9ca3af','lineColor':'#111827','secondaryColor':'#d1d5db','tertiaryColor':'#f3f4f6','edgeLabelBackground':'#ffffff','mainBkg':'#f5f5f4','nodeBorder':'#9ca3af','background':'#f5f5f4','clusterBkg':'transparent'},'themeCSS':'.node rect, .node circle, .node ellipse, .node polygon, .node path { filter: none !important; box-shadow: none !important; } .cluster rect { filter: none !important; box-shadow: none !important; } svg { background-color: #f5f5f4 !important; } .cluster-label { background-color: #ffffff !important; padding: 6px 12px !important; border-radius: 4px !important; font-size: 16px !important; font-weight: 700 !important; box-shadow: 0 1px 3px rgba(0,0,0,0.12) !important; border: 1px solid #d1d5db !important; } .edgePath, .edgePath path, .flowchart-link { z-index: 1 !important; }'}}%%
flowchart TD
Start([Sunday 3:00 AM<br/>Cron Trigger])
Diun{Diun<br/>Check Updates}
Policy{Update Policy?}
%% Automatic Path (nginx)
AutoEmail[📧 Email: nginx update available]
AutoCron([auto-update.sh<br/>Cron Execution])
AutoBackup[🗄️ Backup Database & Data<br/>to S3]
AutoBackupFail{Backup<br/>Success?}
AutoPull[📥 Pull New Image<br/>nginx:latest-version]
AutoRecreate[🔄 Recreate Container<br/>docker compose up]
AutoHealth{Health Check<br/>Pass?}
AutoRollback[↩️ Rollback<br/>Restore Previous Image]
AutoRollbackHealth{Rollback<br/>Health OK?}
AutoSuccess[✅ Update Complete<br/>Log Success]
AutoFailEmail[📧 Email: Update Failed<br/>System Rolled Back]
%% Manual Path (gitea/postgres)
ManualEmail[📧 Email: Critical Update Available<br/>gitea or postgres]
OperatorReview{Operator<br/>Reviews & Approves}
ManualRun([Operator runs<br/>manual-update.sh])
ManualConfirm{Confirm<br/>Update?}
ManualBackup[🗄️ Backup Database & Data<br/>to S3]
ManualBackupFail{Backup<br/>Success?}
ManualPull[📥 Pull New Image<br/>gitea:x.y.z or postgres:x.y]
ManualRecreate[🔄 Recreate Container<br/>docker compose up]
ManualHealth{Health Check<br/>Pass?}
ManualRollback[↩️ Rollback<br/>Restore Previous Image]
ManualRollbackHealth{Rollback<br/>Health OK?}
ManualSuccess[✅ Update Complete<br/>Email Success]
ManualFailEmail[📧 Email: Update Failed<br/>System Rolled Back]
ManualAbort[❌ Update Aborted]
%% Certificate Renewal Path
CertStart([Sunday 3:30 AM<br/>Cron Trigger])
CertRenew[🔐 Certbot Renew<br/>docker compose run certbot]
CertCheck{Certificate<br/>Renewed?}
CertRestart[🔄 Restart nginx<br/>docker compose restart]
CertSuccess[✅ Certificate Valid]
CertSkip[ No Renewal Needed]
%% Flow connections
Start --> Diun
Diun -->|Updates Found| Policy
Diun -->|No Updates| End1[End]
%% Automatic Path
Policy -->|automatic<br/>nginx| AutoEmail
AutoEmail --> AutoCron
AutoCron --> AutoBackup
AutoBackup --> AutoBackupFail
AutoBackupFail -->|❌ Failed| AutoFailEmail
AutoFailEmail --> End2[End]
AutoBackupFail -->|✅ Success| AutoPull
AutoPull --> AutoRecreate
AutoRecreate --> AutoHealth
AutoHealth -->|✅ Healthy| AutoSuccess
AutoSuccess --> End3[End]
AutoHealth -->|❌ Unhealthy| AutoRollback
AutoRollback --> AutoRollbackHealth
AutoRollbackHealth -->|✅ Healthy| AutoFailEmail
AutoRollbackHealth -->|❌ Still Failed| AutoFailEmail
%% Manual Path
Policy -->|manual<br/>gitea/postgres| ManualEmail
ManualEmail --> OperatorReview
OperatorReview -->|Later| End4[End]
OperatorReview -->|Now| ManualRun
ManualRun --> ManualConfirm
ManualConfirm -->|No| ManualAbort
ManualAbort --> End5[End]
ManualConfirm -->|Yes| ManualBackup
ManualBackup --> ManualBackupFail
ManualBackupFail -->|❌ Failed| ManualFailEmail
ManualFailEmail --> End6[End]
ManualBackupFail -->|✅ Success| ManualPull
ManualPull --> ManualRecreate
ManualRecreate --> ManualHealth
ManualHealth -->|✅ Healthy| ManualSuccess
ManualSuccess --> End7[End]
ManualHealth -->|❌ Unhealthy| ManualRollback
ManualRollback --> ManualRollbackHealth
ManualRollbackHealth -->|✅ Healthy| ManualFailEmail
ManualRollbackHealth -->|❌ Still Failed| ManualFailEmail
%% Certificate Renewal Path (separate flow)
CertStart --> CertRenew
CertRenew --> CertCheck
CertCheck -->|New Cert| CertRestart
CertRestart --> CertSuccess
CertSuccess --> End8[End]
CertCheck -->|Not Due| CertSkip
CertSkip --> End9[End]
%% Styling
classDef trigger fill:#F59E0B,stroke:#B45309,stroke-width:2px,color:#111827
classDef decision fill:#F97316,stroke:#C2410C,stroke-width:2px,color:#111827
classDef action fill:#3B82F6,stroke:#1D4ED8,stroke-width:2px,color:#ffffff
classDef success fill:#10B981,stroke:#047857,stroke-width:2px,color:#111827
classDef failure fill:#EF4444,stroke:#B91C1C,stroke-width:2px,color:#ffffff
classDef operator fill:#8B5CF6,stroke:#6D28D9,stroke-width:2px,color:#ffffff
classDef monitor fill:#F59E0B,stroke:#B45309,stroke-width:2px,color:#111827
classDef email fill:#6366F1,stroke:#4338CA,stroke-width:2px,color:#ffffff
classDef backup fill:#8B5CF6,stroke:#6D28D9,stroke-width:2px,color:#ffffff
class Start,AutoCron,ManualRun,CertStart trigger
class Diun,Policy,AutoBackupFail,AutoHealth,AutoRollbackHealth,ManualBackupFail,ManualHealth,ManualRollbackHealth,OperatorReview,ManualConfirm,CertCheck monitor
class AutoBackup,AutoPull,AutoRecreate,AutoRollback,ManualBackup,ManualPull,ManualRecreate,ManualRollback,CertRenew,CertRestart action
class AutoSuccess,ManualSuccess,CertSuccess,CertSkip success
class AutoFailEmail,ManualFailEmail,ManualAbort failure
class AutoEmail,ManualEmail email
```
## Update Policies
### Automatic (Low Risk)
- **nginx**: Reverse proxy with stateless configuration
- Process: Detected → Backup → Update → Health Check → Success/Rollback
- No operator intervention required
### Manual (High Risk)
- **gitea**: Git hosting application with user data
- **postgres**: Database containing all repository data
- Process: Detected → Email → Operator Reviews → Approval → Backup → Update → Health Check → Success/Rollback
## Safety Mechanisms
1. **Pre-Update Backup**: Database and Gitea data backed up to S3 before any changes
2. **Health Checks**: Services validated after update (container running, postgres responding, gitea accessible, nginx config valid)
3. **Automatic Rollback**: Failed health check triggers immediate rollback to previous image
4. **Email Notifications**: Operator notified of:
- Available updates (manual containers)
- Update failures (all containers)
- Successful updates (manual containers only)
## Certificate Renewal
Runs separately at 3:30 AM on Sundays:
- Certbot checks certificate expiration
- Renews if within 30 days of expiry
- Restarts nginx to load new certificate
- Process is idempotent (safe to run weekly)
## Monitoring
**Diun Configuration**:
- Schedule: `0 3 * * 0` (Sunday 3:00 AM)
- Monitors: postgres, gitea, nginx, diun
- Email: Via AWS SES SMTP
- Labels: Containers marked with `diun.enable=true` and `update.policy=automatic|manual`