qvest-task/ADR.md
gitea_admin 685de1816d feat: implement update automation and backup system with CI tests (#1)
- Diun monitors Docker images
- Automated updates for nginx, manual approval for gitea/postgres
- Weekly cert renewal automation via cron
- Health checks with automatic rollback on failure
- AWS SES email notifications on update failures
- Daily S3 backups + pre-update snapshots
- Integration tests with Gitea Actions quality gate
- Change domain from gitea.poll-streams.com to git.poll-streams.com
- Add diagrams
2026-06-11 15:51:48 +00:00

9.2 KiB

Architecture Decision Records (ADR)

This document tracks all significant architectural decisions made during the project, including rationale and trade-offs.


ADR-001: Cloud Provider - AWS

Date: 2026-06-08
Status: Accepted

Decision: Use Amazon Web Services (AWS)

Rationale:

  • Industry-standard cloud provider with comprehensive service portfolio
  • Access to managed services when beneficial
  • Strong ecosystem and community support
  • Terraform has excellent AWS provider support

ADR-002: Infrastructure as Code - Terraform

Date: 2026-06-08
Status: Accepted

Decision: Use Terraform for infrastructure provisioning

Rationale:

  • Declarative approach (aligns with project philosophy)
  • Industry standard for cloud infrastructure
  • Excellent AWS provider
  • State management enables reproducibility

Scope: VPC, EC2, Security Groups, S3, Route 53


ADR-003: Configuration Management - Ansible

Date: 2026-06-08
Status: Accepted

Decision: Use Ansible for system configuration (kept minimal)

Rationale:

  • Avoids problematic user-data scripts (bad experience with debugging)
  • Idempotent - can re-run if setup fails
  • Real-time output visibility via SSH
  • Professional separation of concerns: Terraform (infra) → Ansible (config) → Docker (apps)

Scope: Install Docker, configure system basics, setup firewall Philosophy: Keep Ansible simple - no fancy roles or complexity

Alternative Considered: User-data scripts - rejected due to debugging difficulty and one-shot nature


ADR-004: Application Deployment - Docker + Docker Compose

Date: 2026-06-08
Status: Accepted

Decision: Use Docker with Docker Compose for application orchestration

Rationale:

  • Fully declarative (docker-compose.yml)
  • Easy to test locally (dev/prod parity)
  • Simple version control and updates
  • Gitea has official Docker images
  • Portable and reproducible

Scope: Gitea, nginx, PostgreSQL, monitoring stack (later)


ADR-005: Database - Self-Hosted PostgreSQL in Docker

Date: 2026-06-08
Status: Accepted

Decision: PostgreSQL container, not RDS

Rationale:

  • Simpler architecture (everything in docker-compose.yml)
  • Shows ability to build and manage backups ourselves
  • More control over configuration
  • Cost-effective
  • PostgreSQL is Gitea's recommended database

Trade-offs:

  • Pros: Greater control, cost-effective, simpler architecture
  • Cons: Requires custom backup automation and testing

Backup Strategy: Custom scripts with pg_dump to S3 (detailed in backup phase)

Future Consideration: For higher availability requirements or larger scale, RDS would provide managed backups, point-in-time recovery, and Multi-AZ deployment


ADR-006: Reverse Proxy - Nginx

Date: 2026-06-08
Status: Accepted

Decision: Nginx as reverse proxy

Rationale:

  • Lightweight and performant
  • Simple configuration for basic proxying
  • Industry standard
  • Works well in Docker

Scope: SSL termination, proxy to Gitea, HTTP→HTTPS redirect


ADR-007: SSL Certificates - Let's Encrypt

Date: 2026-06-08 (Updated 2026-06-11)
Status: Accepted

Decision: Let's Encrypt with certbot

Rationale:

  • Free, automated, trusted certificates
  • Widely accepted by all browsers (no certificate warnings)
  • Auto-renewal reduces operational burden
  • Industry-standard solution for SSL/TLS

Requirement: Valid domain name pointing to server

Domain: git.poll-streams.com (changed from gitea.poll-streams.com)

Implementation Note: Initially encountered Let's Encrypt rate limits (5 certificates per week). Resolved by migrating to a fresh domain identifier (git.poll-streams.com), allowing immediate production certificate issuance. Production certificates obtained successfully.


ADR-008: Update Automation - Diun + Custom Scripts

Date: 2026-06-08
Status: Accepted (Updated 2026-06-09)

Decision: Diun (Docker Image Update Notifier) for monitoring + custom bash scripts for orchestration

Rationale:

  • Diun monitors for updates and sends email notifications (built-in)
  • Enables differentiated update policies per container
  • Custom scripts provide full control over update workflow
  • Supports pre-update backups and health checks
  • Allows manual approval for critical components (Gitea, PostgreSQL)
  • Auto-update for low-risk components (nginx, certbot)
  • Demonstrates production-level engineering (not just "update everything")

Update Strategy:

  • Schedule: Weekly checks during off-hours
  • Nginx/Certbot: Automatic updates after backup
  • Gitea/PostgreSQL: Email notification, manual approval required
  • Backup: Pre-update backup to S3 (database + Gitea data)
  • Health Checks: Post-update validation
  • Rollback: Automatic rollback on health check failure
  • Notifications: Email alerts on critical failures, logs for successful updates

Scope:

  • Diun container monitors all Docker images
  • auto-update.sh - automated update for nginx/certbot
  • manual-update.sh - operator-approved update for gitea/postgres
  • Health check and rollback logic

Alternative Considered: Watchtower - rejected because it lacks per-container policies, pre-update backups, and proper notification support


ADR-012: CI/CD - Gitea Actions with Self-Hosted Runners

Date: 2026-06-11
Status: Accepted

Decision: Use Gitea Actions with self-hosted runners for CI/CD

Rationale:

  • Native integration with Gitea (no external CI service)
  • Self-hosted runners provide full control and security
  • GitHub Actions-compatible workflow syntax (familiar, well-documented)
  • Enables automated testing before merging changes
  • Demonstrates production-grade CI/CD practices

Implementation:

  • Runners: 2x act_runner v0.2.10 instances as systemd services
  • Automation: Ansible playbook (setup-runner.yml) for reproducible deployment
  • Runner Registration: Automated via Gitea API with token from AWS Secrets Manager
  • Networking: Host network mode for job containers to access Gitea
  • Registration URL: https://git.poll-streams.com (public URL for git clone operations)
  • Workflow: .gitea/workflows/test.yml runs integration tests on PRs
  • Features: Docker layer caching, artifact uploads, workflow_dispatch support

Technical Details:

  • Each runner has dedicated config directory (/etc/act_runner-{1,2})
  • Configuration includes host networking to allow job containers to reach services
  • Runners registered with public URL to avoid localhost connection issues
  • Systemd manages runner lifecycle with automatic restart

Benefits:

  • Automated quality gates before merging
  • Consistent test environment (matches CI exactly)
  • Fast feedback on code changes
  • Self-contained solution (no external dependencies)

ADR-009: Monitoring - Prometheus + Grafana

Date: 2026-06-08
Status: Accepted (implementation later)

Decision: Prometheus for metrics, Grafana for visualization

Rationale:

  • Industry standard monitoring stack
  • Powerful querying with PromQL
  • Rich visualization and alerting capabilities
  • Strong community and pre-built dashboards

Note: To be implemented in later phase


ADR-010: Logging - Loki + Promtail

Date: 2026-06-08
Status: Accepted (implementation later)

Decision: Loki for log aggregation, Promtail for collection

Rationale:

  • Lightweight compared to ELK stack
  • Integrates with Grafana (single pane of glass)
  • Good fit for Docker environments

Note: To be implemented in later phase


ADR-011: Backup Strategy - Custom Scripts + S3

Date: 2026-06-08
Status: Accepted (implementation later)

Decision: Bash scripts with pg_dump and AWS S3

Rationale:

  • Simple and maintainable
  • Full control over backup process and scheduling
  • S3 provides highly durable storage (99.999999999%)
  • Easy to test and validate restore procedures

Scope:

  • Database backups (pg_dump)
  • Gitea repository data
  • Configuration files
  • Automated scheduling with cron

Note: Details to be designed in backup phase


Technology Stack Summary

Layer Technology Rationale
Cloud AWS Industry standard
Infrastructure Terraform Declarative IaC
Configuration Ansible (minimal) System setup, avoids user-data
Compute EC2 Flexible VM hosting
Application Docker Compose Declarative orchestration
Database PostgreSQL (Docker) Self-managed, shows control
Reverse Proxy Nginx Lightweight, standard
SSL Let's Encrypt Free, automated, professional
DNS Route 53 AWS-native
Updates Diun + Scripts Per-container policies, backup/rollback
CI/CD Gitea Actions Self-hosted runners, native integration
Backups Scripts + S3 Custom, controlled
Monitoring Prometheus + Grafana Industry standard
Logging Loki + Promtail Lightweight, integrated

Core Principles

  1. Simplicity First: Avoid overengineering
  2. Declarative Over Imperative: Terraform, Docker Compose
  3. Infrastructure as Code: Everything version-controlled
  4. Show Control: Build things ourselves where it demonstrates skill
  5. Professional: Production-grade practices