# Architecture Decision Records (ADR) This document tracks all significant architectural decisions made during the project, including rationale and trade-offs. --- ## ADR-001: Cloud Provider - AWS **Date**: 2026-06-08 **Status**: Accepted **Decision**: Use Amazon Web Services (AWS) **Rationale**: - Industry-standard cloud provider with comprehensive service portfolio - Access to managed services when beneficial - Strong ecosystem and community support - Terraform has excellent AWS provider support --- ## ADR-002: Infrastructure as Code - Terraform **Date**: 2026-06-08 **Status**: Accepted **Decision**: Use Terraform for infrastructure provisioning **Rationale**: - Declarative approach (aligns with project philosophy) - Industry standard for cloud infrastructure - Excellent AWS provider - State management enables reproducibility **Scope**: VPC, EC2, Security Groups, S3, Route 53 --- ## ADR-003: Configuration Management - Ansible **Date**: 2026-06-08 **Status**: Accepted **Decision**: Use Ansible for system configuration (kept minimal) **Rationale**: - Avoids problematic user-data scripts (bad experience with debugging) - Idempotent - can re-run if setup fails - Real-time output visibility via SSH - Professional separation of concerns: Terraform (infra) → Ansible (config) → Docker (apps) **Scope**: Install Docker, configure system basics, setup firewall **Philosophy**: Keep Ansible simple - no fancy roles or complexity **Alternative Considered**: User-data scripts - rejected due to debugging difficulty and one-shot nature --- ## ADR-004: Application Deployment - Docker + Docker Compose **Date**: 2026-06-08 **Status**: Accepted **Decision**: Use Docker with Docker Compose for application orchestration **Rationale**: - Fully declarative (docker-compose.yml) - Easy to test locally (dev/prod parity) - Simple version control and updates - Gitea has official Docker images - Portable and reproducible **Scope**: Gitea, nginx, PostgreSQL, monitoring stack (later) --- ## ADR-005: Database - Self-Hosted PostgreSQL in Docker **Date**: 2026-06-08 **Status**: Accepted **Decision**: PostgreSQL container, not RDS **Rationale**: - Simpler architecture (everything in docker-compose.yml) - Shows ability to build and manage backups ourselves - More control over configuration - Cost-effective - PostgreSQL is Gitea's recommended database **Trade-offs**: - **Pros**: Greater control, cost-effective, simpler architecture - **Cons**: Requires custom backup automation and testing **Backup Strategy**: Custom scripts with pg_dump to S3 (detailed in backup phase) **Future Consideration**: For higher availability requirements or larger scale, RDS would provide managed backups, point-in-time recovery, and Multi-AZ deployment --- ## ADR-006: Reverse Proxy - Nginx **Date**: 2026-06-08 **Status**: Accepted **Decision**: Nginx as reverse proxy **Rationale**: - Lightweight and performant - Simple configuration for basic proxying - Industry standard - Works well in Docker **Scope**: SSL termination, proxy to Gitea, HTTP→HTTPS redirect --- ## ADR-007: SSL Certificates - Let's Encrypt **Date**: 2026-06-08 (Updated 2026-06-11) **Status**: Accepted **Decision**: Let's Encrypt with certbot **Rationale**: - Free, automated, trusted certificates - Widely accepted by all browsers (no certificate warnings) - Auto-renewal reduces operational burden - Industry-standard solution for SSL/TLS **Requirement**: Valid domain name pointing to server **Domain**: git.poll-streams.com (changed from gitea.poll-streams.com) **Implementation Note**: Initially encountered Let's Encrypt rate limits (5 certificates per week). Resolved by migrating to a fresh domain identifier (git.poll-streams.com), allowing immediate production certificate issuance. Production certificates obtained successfully. --- ## ADR-008: Update Automation - Diun + Custom Scripts **Date**: 2026-06-08 **Status**: Accepted (Updated 2026-06-09) **Decision**: Diun (Docker Image Update Notifier) for monitoring + custom bash scripts for orchestration **Rationale**: - Diun monitors for updates and sends email notifications (built-in) - Enables differentiated update policies per container - Custom scripts provide full control over update workflow - Supports pre-update backups and health checks - Allows manual approval for critical components (Gitea, PostgreSQL) - Auto-update for low-risk components (nginx, certbot) - Demonstrates production-level engineering (not just "update everything") **Update Strategy**: - **Schedule**: Weekly checks during off-hours - **Nginx/Certbot**: Automatic updates after backup - **Gitea/PostgreSQL**: Email notification, manual approval required - **Backup**: Pre-update backup to S3 (database + Gitea data) - **Health Checks**: Post-update validation - **Rollback**: Automatic rollback on health check failure - **Notifications**: Email alerts on critical failures, logs for successful updates **Scope**: - Diun container monitors all Docker images - `auto-update.sh` - automated update for nginx/certbot - `manual-update.sh` - operator-approved update for gitea/postgres - Health check and rollback logic **Alternative Considered**: Watchtower - rejected because it lacks per-container policies, pre-update backups, and proper notification support --- ## ADR-012: CI/CD - Gitea Actions with Self-Hosted Runners **Date**: 2026-06-11 **Status**: Accepted **Decision**: Use Gitea Actions with self-hosted runners for CI/CD **Rationale**: - Native integration with Gitea (no external CI service) - Self-hosted runners provide full control and security - GitHub Actions-compatible workflow syntax (familiar, well-documented) - Enables automated testing before merging changes - Demonstrates production-grade CI/CD practices **Implementation**: - **Runners**: 2x act_runner v0.2.10 instances as systemd services - **Automation**: Ansible playbook (setup-runner.yml) for reproducible deployment - **Runner Registration**: Automated via Gitea API with token from AWS Secrets Manager - **Networking**: Host network mode for job containers to access Gitea - **Registration URL**: https://git.poll-streams.com (public URL for git clone operations) - **Workflow**: .gitea/workflows/test.yml runs integration tests on PRs - **Features**: Docker layer caching, artifact uploads, workflow_dispatch support **Technical Details**: - Each runner has dedicated config directory (/etc/act_runner-{1,2}) - Configuration includes host networking to allow job containers to reach services - Runners registered with public URL to avoid localhost connection issues - Systemd manages runner lifecycle with automatic restart **Benefits**: - Automated quality gates before merging - Consistent test environment (matches CI exactly) - Fast feedback on code changes - Self-contained solution (no external dependencies) --- ## ADR-009: Monitoring - Prometheus + Grafana **Date**: 2026-06-08 **Status**: Accepted (implementation later) **Decision**: Prometheus for metrics, Grafana for visualization **Rationale**: - Industry standard monitoring stack - Powerful querying with PromQL - Rich visualization and alerting capabilities - Strong community and pre-built dashboards **Note**: To be implemented in later phase --- ## ADR-010: Logging - Loki + Promtail **Date**: 2026-06-08 **Status**: Accepted (implementation later) **Decision**: Loki for log aggregation, Promtail for collection **Rationale**: - Lightweight compared to ELK stack - Integrates with Grafana (single pane of glass) - Good fit for Docker environments **Note**: To be implemented in later phase --- ## ADR-011: Backup Strategy - Custom Scripts + S3 **Date**: 2026-06-08 **Status**: Accepted (implementation later) **Decision**: Bash scripts with pg_dump and AWS S3 **Rationale**: - Simple and maintainable - Full control over backup process and scheduling - S3 provides highly durable storage (99.999999999%) - Easy to test and validate restore procedures **Scope**: - Database backups (pg_dump) - Gitea repository data - Configuration files - Automated scheduling with cron **Note**: Details to be designed in backup phase --- ## Technology Stack Summary | Layer | Technology | Rationale | |-------|-----------|-----------| | **Cloud** | AWS | Industry standard | | **Infrastructure** | Terraform | Declarative IaC | | **Configuration** | Ansible (minimal) | System setup, avoids user-data | | **Compute** | EC2 | Flexible VM hosting | | **Application** | Docker Compose | Declarative orchestration | | **Database** | PostgreSQL (Docker) | Self-managed, shows control | | **Reverse Proxy** | Nginx | Lightweight, standard | | **SSL** | Let's Encrypt | Free, automated, professional | | **DNS** | Route 53 | AWS-native | | **Updates** | Diun + Scripts | Per-container policies, backup/rollback | | **CI/CD** | Gitea Actions | Self-hosted runners, native integration | | **Backups** | Scripts + S3 | Custom, controlled | | **Monitoring** | Prometheus + Grafana | Industry standard | | **Logging** | Loki + Promtail | Lightweight, integrated | --- ## Core Principles 1. **Simplicity First**: Avoid overengineering 2. **Declarative Over Imperative**: Terraform, Docker Compose 3. **Infrastructure as Code**: Everything version-controlled 4. **Show Control**: Build things ourselves where it demonstrates skill 5. **Professional**: Production-grade practices