qvest-task/ROADMAP.md
aviyadeveloper 890a23e8d5
Some checks failed
Update Automation Tests / Integration Tests (pull_request) Failing after 40s
feat: complete CI/CD automation and fix deployment issues
Infrastructure & Permissions:
- Set recovery_window_in_days=0 on secrets for immediate deletion on destroy
- Add secretsmanager:UpdateSecret permission to EC2 IAM role
- Move SES secret definition from ses.tf to secrets.tf for better organization
- Create scripts/empty-s3-bucket.sh to handle versioned S3 object deletion
- Update Makefile to use S3 cleanup script in full-destroy target

Gitea Admin User Automation:
- Remove non-functional GITEA_ADMIN_* environment variables from docker-compose.yml
- Add CLI-based admin user creation via docker exec in deploy-gitea.yml
- Add database update to disable must_change_password requirement
- Fix runner token API call to use GET instead of POST

Runner Setup Fixes:
- Change runner gitea_instance to http://localhost:3000 (was failing with public URL)
- Fix registration to work from same host as Gitea

Domain Migration:
- Change domain from gitea.poll-streams.com to git.poll-streams.com
- Update DNS, docker-compose, nginx configs, ansible inventory, and SSL setup
- Enables fresh SSL certificate (avoids Let's Encrypt rate limit)

All changes enable zero-to-one deployment: make full-destroy && make full-deploy
2026-06-11 17:16:51 +02:00

13 KiB

Roadmap

This is the implementation road map for the project. It outlines the key milestones and features in incremental steps, allowing for a structured approach to development and deployment.

Phase 1: Conceptualization and Planning

This phase will be achieved through discussion and research and will include the following steps (no code should be implemented in this phase):

1.1 Requirements Analysis

  • Define the scope and requirements of the project
  • Identify constraints and non-functional requirements
  • Determine host environment (cloud provider, VPS, or local)

1.2 Technology Selection

Decisions documented in ADR.md

  • Cloud: AWS
  • Infrastructure as Code: Terraform
  • Configuration Management: Ansible (kept minimal)
  • Application Deployment: Docker + Docker Compose
  • Database: PostgreSQL (self-hosted in Docker)
  • Reverse Proxy: Nginx
  • SSL: Let's Encrypt with certbot
  • Update Automation: Diun + Custom Scripts
  • Monitoring: Prometheus + Grafana (later phase)
  • Logging: Loki + Promtail (later phase)
  • Backup: Custom scripts + S3 (later phase)

1.3 Architecture Design

  • Overall system architecture designed
  • Network topology planned (VPC, subnets, security groups)
  • Three architecture diagrams created in docs/diagrams/

1.4 Project Structure

  • Directory structure planned (will create incrementally per phase)
  • Documentation structure in place (docs/diagrams/)
  • Naming conventions: lowercase, hyphens for files, descriptive names

Goals:

  • A clear full Roadmap for the project available in this file
  • Technology stack documented with rationale (see ADR.md)
  • Architecture diagrams created (3 diagrams in docs/diagrams/)
  • Project structure planned

Phase 1 Complete! Ready to begin Phase 2 (Infrastructure Setup).


Phase 2: Infrastructure Setup

This phase provisions the AWS infrastructure using Terraform.

2.1 Terraform Backend Setup

  • Configure AWS CLI and credentials locally
  • Set up Terraform backend (S3 bucket for state storage)
  • Initialize Terraform working directory

2.2 Core Infrastructure

  • Create VPC with single public subnet
  • Set up Internet Gateway
  • Configure Security Group for EC2 (ports 22, 80, 443)
  • Provision EC2 instance (t3.medium, Ubuntu 24.04) with IAM role
  • Create S3 bucket for backups (with versioning & encryption)
  • Configure Route 53 DNS records (A record: git.poll-streams.com → EC2)
  • Use official Terraform AWS modules (VPC, Security Group)
  • Refactored into separate files: main.tf, vpc.tf, security.tf, compute.tf, storage.tf, iam.tf, dns.tf, outputs.tf

2.3 Security Configuration

  • Configure SSH key-based authentication (Ed25519, generated via Terraform)
  • SSH access from anywhere (0.0.0.0/0) - security via key-based auth
  • Apply IAM policies (AmazonS3FullAccess for EC2 backups)
  • Security group follows least access (only 22, 80, 443 inbound; all outbound)
  • Encrypted EBS root volume (30GB gp3)

Goals:

  • AWS infrastructure fully defined in Terraform code
  • EC2 instance provisioned and accessible via SSH
  • S3 backup bucket created
  • Domain DNS configured and resolving
  • Infrastructure can be destroyed and recreated with terraform apply

Phase 2 Complete! Ready to begin Phase 3 (Automated Gitea Deployment).


Phase 3: Automated Gitea Deployment

This phase implements the automated, reproducible Gitea installation.

3.1 Database Setup

  • PostgreSQL 18.4 deployed via Docker Compose
  • Database credentials stored in AWS Secrets Manager
  • Random password generation via Terraform
  • Volume mounted at /var/lib/postgresql (PostgreSQL 18+ requirement)
  • Health checks configured with pg_isready

3.2 Gitea Installation

  • Gitea 1.22.6 deployed via Docker Compose
  • Ansible playbooks created: setup-system.yml, deploy-gitea.yml, setup-ssl.yml, site.yml
  • Docker + AWS CLI installation automated
  • Gitea configured with environment variables (database, domain, ROOT_URL)
  • SSH git access on port 2222
  • Volumes for persistent data

3.3 Reverse Proxy Configuration

  • Nginx 1.27-alpine deployed via Docker Compose
  • Let's Encrypt SSL certificate obtained via certbot
  • Two-stage nginx config (HTTP-only for ACME, then HTTPS)
  • SSL termination at nginx, proxy to Gitea on port 3000
  • HTTP to HTTPS redirect configured
  • Security headers (HSTS, X-Frame-Options, etc.)
  • WebSocket support for real-time features
  • 512MB upload limit

3.4 Testing

  • HTTPS access verified: https://git.poll-streams.com
  • Valid SSL certificate (Let's Encrypt)
  • HTTP → HTTPS redirect working
  • Gitea web interface accessible and functional
  • User account created, repository created
  • Git push via HTTPS tested successfully
  • Full deployment reproducible via ansible-playbook site.yml

Goals:

  • Gitea running and accessible via HTTPS through reverse proxy
  • Installation fully automated and reproducible
  • Production-grade deployment with SSL

Phase 3 Complete! Gitea is fully deployed, secured with SSL, and accessible from the internet.


Phase 4: Update Automation

This phase implements automated update mechanisms for Gitea and related components.

4.1 Update Strategy Design

  • Weekly update checks (Sunday 3:00 AM)
  • Per-container update policies (automatic vs manual)
  • Pre-update backup to S3
  • Post-update health checks
  • Automatic rollback on failure
  • Email notifications via AWS SES

4.2 Update Monitoring

  • Diun 4.33 deployed for Docker image update detection
  • Scheduled weekly checks (cron: 0 3 * * 0)
  • Monitors: postgres, gitea, nginx, diun
  • Email notifications configured via AWS SES SMTP
  • IAM user created for SMTP credentials
  • Labels define update policies per container

4.3 Automated Scripts

  • backup.sh: Database + Gitea data backup to S3 bucket
  • health-check.sh: Validates all services running and responsive
  • auto-update.sh: Automatic updates for low-risk containers (nginx)
    • Backup before update
    • Pull new image
    • Recreate container
    • Health check validation
    • Automatic rollback on failure
    • Email notifications
  • manual-update.sh: Manual updates for critical containers (gitea/postgres)
    • Operator confirmation required
    • Same safety flow as auto-update
    • Success/failure notifications
  • test-update.sh: Quality gate for CI/local validation
    • Validates script syntax
    • Checks required functions
    • Verifies control flow logic
    • Tests error handling patterns
    • No live services required

4.4 Cron Jobs

  • Weekly automatic update (nginx only): Sunday 3:15 AM
  • Weekly certificate renewal: Sunday 3:30 AM
  • Daily backups: 2:00 AM
  • All configured via Ansible (setup-cron.yml)

4.5 Certificate Renewal

  • Automated weekly renewal check via cron
  • Uses certbot container: docker compose run --rm certbot renew
  • Restarts nginx to load new certificates
  • Process is idempotent (safe to run weekly)

4.6 Testing & Validation

  • Integration tests created (test-update.sh)
  • All scripts tested on live system
  • Cron jobs verified
  • Email notifications tested
  • Diun monitoring confirmed (4 containers)
  • Update workflow diagram created

Goals:

  • Automated update system operational
  • Update process tested and validated on live system
  • Rollback procedure implemented and tested
  • Quality gate for CI/local environments
  • Documentation complete (workflow diagram)

Implementation Summary:

  • 5 bash scripts following best practices (DRY, error handling, logging)
  • Diun monitoring with AWS SES email notifications
  • Per-container update policies (automatic: nginx, manual: gitea/postgres)
  • Pre-update backups with automatic rollback on failure
  • Certificate renewal automation
  • Comprehensive testing framework
  • Visual workflow documentation

Phase 4 Complete! Update automation fully operational with safety mechanisms.


Phase 5: Backup Strategy Implementation

This phase implements comprehensive backup solutions.

5.1 Backup Concept Document

  • Document backup strategy (3-2-1 rule)
  • Define backup scope (database, repos, config, etc.)
  • Define retention policy
  • Define RTO and RPO targets

5.2 Backup Implementation

  • Automate database backups
  • Automate Gitea data directory backups
  • Automate configuration backups
  • Set up backup storage (local + remote)
  • Implement backup rotation and cleanup
  • Schedule automated backups

5.3 Recovery Testing

  • Document restore procedures
  • Test database restore
  • Test full system restore
  • Document recovery time

Goals:

  • Automated backup system operational
  • Restore procedures tested and documented
  • Backup strategy document completed

Phase 6: Monitoring Implementation

This phase implements monitoring for system health and performance.

6.1 Monitoring Concept Document

  • Define key metrics to monitor
  • Define alerting thresholds
  • Define alert channels (email, Slack, etc.)

6.2 Monitoring Setup

  • Deploy monitoring solution
  • Configure system metrics collection (CPU, RAM, disk, network)
  • Configure Gitea-specific metrics
  • Configure database metrics
  • Set up monitoring dashboards
  • Configure alerting rules

6.3 Testing

  • Simulate failure scenarios
  • Verify alerts trigger correctly
  • Validate dashboard accuracy

Goals:

  • Monitoring system operational with dashboards
  • Alerting configured and tested
  • Monitoring concept document completed

Phase 7: Logging Implementation

This phase implements centralized logging for all components.

7.1 Logging Concept Document

  • Define logging architecture
  • Define log retention policy
  • Define log analysis requirements

7.2 Logging Setup

  • Deploy centralized logging solution
  • Configure Gitea application logging
  • Configure reverse proxy access logs
  • Configure database logs
  • Configure system logs collection
  • Set up log parsing and indexing
  • Create log search and visualization dashboards

7.3 Testing

  • Verify logs are being collected
  • Test log search functionality
  • Test log-based alerts (if applicable)

Goals:

  • Centralized logging operational
  • All components sending logs to central system
  • Logging concept document completed

Phase 8: Redundancy and High Availability

This phase implements fail-safe operations and redundancy.

8.1 Redundancy Concept Document

  • Document SPOF (Single Points of Failure) analysis
  • Design HA architecture
  • Define failover strategy
  • Define acceptable downtime

8.2 Redundancy Implementation (Optional/Simplified)

  • Implement database redundancy (replication/clustering) OR document approach
  • Implement application redundancy (multiple Gitea instances) OR document approach
  • Implement load balancing OR document approach
  • Document manual failover procedures

Goals:

  • Redundancy concept document completed
  • PoC or detailed plan for HA implementation
  • Failover procedures documented

Phase 9: Documentation and Final Testing

This phase consolidates all documentation and performs end-to-end testing.

9.1 Documentation

  • Create comprehensive README
  • Document architecture with diagrams
  • Document all procedures (deployment, updates, backup/restore, failover)
  • Create runbooks for common scenarios
  • Document interview discussion points

9.2 Final Testing

  • Perform end-to-end deployment test
  • Test all automated processes
  • Verify all documentation is accurate
  • Test system under load (optional)

9.3 Repository Organization

  • Store all code and docs in Gitea repository
  • Ensure repository is well-organized
  • Add proper README and documentation

Goals:

  • Complete documentation package
  • All automation tested and validated
  • Ready for interview presentation

Phase 10: Interview Preparation

This phase prepares for the interview discussion.

10.1 Preparation

  • Review all concept documents
  • Prepare to explain technology choices
  • Prepare architecture diagrams for presentation
  • Prepare to demonstrate the system
  • List lessons learned and trade-offs made
  • Prepare improvement suggestions

Goals:

  • Ready to discuss all aspects of the implementation
  • Demo environment functional and accessible
  • Confident in technology choices and concepts

Success Criteria

  • Gitea accessible via HTTPS through reverse proxy
  • Installation fully automated and reproducible
  • Automated updates configured and tested
  • Comprehensive concept documents for: Backup, Monitoring, Logging, Redundancy
  • At least one PoC implementation (optional but recommended)
  • All code and documentation in Git repository
  • System accessible to interviewer over internet