- Create restore.sh for automated S3 backup recovery - Fetches backups, stops services, restores database/data/config, restarts & validates - Successfully tested on production system - Document procedures in backup-strategy.md - Add Test 6: Full backup/restore cycle with disaster simulation - Rename test-update.sh → test-integration.sh Co-authored-by: aviyadeveloper <aviya.developer@gmail.com> Reviewed-on: #2
17 KiB
Roadmap
This is the implementation road map for the project. It outlines the key milestones and features in incremental steps, allowing for a structured approach to development and deployment.
Phase 1: Conceptualization and Planning
This phase will be achieved through discussion and research and will include the following steps (no code should be implemented in this phase):
1.1 Requirements Analysis
- Define the scope and requirements of the project
- Identify constraints and non-functional requirements
- Determine host environment (cloud provider, VPS, or local)
1.2 Technology Selection ✅
Decisions documented in ADR.md
- Cloud: AWS
- Infrastructure as Code: Terraform
- Configuration Management: Ansible (kept minimal)
- Application Deployment: Docker + Docker Compose
- Database: PostgreSQL (self-hosted in Docker)
- Reverse Proxy: Nginx
- SSL: Let's Encrypt with certbot
- Update Automation: Diun + Custom Scripts
- Monitoring: Prometheus + Grafana (later phase)
- Logging: Loki + Promtail (later phase)
- Backup: Custom scripts + S3 (later phase)
1.3 Architecture Design ✅
- ✅ Overall system architecture designed
- ✅ Network topology planned (VPC, subnets, security groups)
- ✅ Three architecture diagrams created in docs/diagrams/
1.4 Project Structure ✅
- Directory structure planned (will create incrementally per phase)
- Documentation structure in place (
docs/diagrams/) - Naming conventions: lowercase, hyphens for files, descriptive names
Goals:
- ✅ A clear full Roadmap for the project available in this file
- ✅ Technology stack documented with rationale (see ADR.md)
- ✅ Architecture diagrams created (3 diagrams in docs/diagrams/)
- ✅ Project structure planned
Phase 1 Complete! Ready to begin Phase 2 (Infrastructure Setup).
Phase 2: Infrastructure Setup
This phase provisions the AWS infrastructure using Terraform.
2.1 Terraform Backend Setup ✅
- Configure AWS CLI and credentials locally
- Set up Terraform backend (S3 bucket for state storage)
- Initialize Terraform working directory
2.2 Core Infrastructure ✅
- ✅ Create VPC with single public subnet
- ✅ Set up Internet Gateway
- ✅ Configure Security Group for EC2 (ports 22, 80, 443)
- ✅ Provision EC2 instance (t3.medium, Ubuntu 24.04) with IAM role
- ✅ Create S3 bucket for backups (with versioning & encryption)
- ✅ Configure Route 53 DNS records (A record: git.poll-streams.com → EC2)
- ✅ Use official Terraform AWS modules (VPC, Security Group)
- ✅ Refactored into separate files: main.tf, vpc.tf, security.tf, compute.tf, storage.tf, iam.tf, dns.tf, outputs.tf
2.3 Security Configuration ✅
- ✅ Configure SSH key-based authentication (Ed25519, generated via Terraform)
- ✅ SSH access from anywhere (0.0.0.0/0) - security via key-based auth
- ✅ Apply IAM policies (AmazonS3FullAccess for EC2 backups)
- ✅ Security group follows least access (only 22, 80, 443 inbound; all outbound)
- ✅ Encrypted EBS root volume (30GB gp3)
Goals: ✅
- ✅ AWS infrastructure fully defined in Terraform code
- ✅ EC2 instance provisioned and accessible via SSH
- ✅ S3 backup bucket created
- ✅ Domain DNS configured and resolving
- ✅ Infrastructure can be destroyed and recreated with
terraform apply
Phase 2 Complete! Ready to begin Phase 3 (Automated Gitea Deployment).
Phase 3: Automated Gitea Deployment
This phase implements the automated, reproducible Gitea installation.
3.1 Database Setup ✅
- ✅ PostgreSQL 18.4 deployed via Docker Compose
- ✅ Database credentials stored in AWS Secrets Manager
- ✅ Random password generation via Terraform
- ✅ Volume mounted at /var/lib/postgresql (PostgreSQL 18+ requirement)
- ✅ Health checks configured with pg_isready
3.2 Gitea Installation ✅
- ✅ Gitea 1.22.6 deployed via Docker Compose
- ✅ Ansible playbooks created: setup-system.yml, deploy-gitea.yml, setup-ssl.yml, site.yml
- ✅ Docker + AWS CLI installation automated
- ✅ Gitea configured with environment variables (database, domain, ROOT_URL)
- ✅ SSH git access on port 2222
- ✅ Volumes for persistent data
3.3 Reverse Proxy Configuration ✅
- ✅ Nginx 1.27-alpine deployed via Docker Compose
- ✅ Let's Encrypt SSL certificate obtained via certbot (production)
- ✅ Domain: git.poll-streams.com (migrated to avoid rate limits)
- ✅ Two-stage nginx config (HTTP-only for ACME, then HTTPS)
- ✅ SSL termination at nginx, proxy to Gitea on port 3000
- ✅ HTTP to HTTPS redirect configured
- ✅ Security headers (HSTS, X-Frame-Options, etc.)
- ✅ WebSocket support for real-time features
- ✅ 512MB upload limit
3.4 Testing ✅
- ✅ HTTPS access verified: https://git.poll-streams.com
- ✅ Valid SSL certificate (Let's Encrypt production)
- ✅ HTTP → HTTPS redirect working
- ✅ Gitea web interface accessible and functional
- ✅ User account created, repository created
- ✅ Git push via HTTPS tested successfully
- ✅ Full deployment reproducible via
ansible-playbook site.yml
Goals: ✅
- ✅ Gitea running and accessible via HTTPS through reverse proxy
- ✅ Installation fully automated and reproducible
- ✅ Production-grade deployment with SSL
Phase 3 Complete! Gitea is fully deployed, secured with SSL, and accessible from the internet.
Phase 4: Update Automation ✅
This phase implements automated update mechanisms for Gitea and related components.
4.1 Update Strategy Design ✅
- ✅ Weekly update checks (Sunday 3:00 AM)
- ✅ Per-container update policies (automatic vs manual)
- ✅ Pre-update backup to S3
- ✅ Post-update health checks
- ✅ Automatic rollback on failure
- ✅ Email notifications via AWS SES
4.2 Update Monitoring ✅
- ✅ Diun 4.33 deployed for Docker image update detection
- ✅ Scheduled weekly checks (cron:
0 3 * * 0) - ✅ Monitors: postgres, gitea, nginx, diun
- ✅ Email notifications configured via AWS SES SMTP
- ✅ IAM user created for SMTP credentials
- ✅ Labels define update policies per container
4.3 Automated Scripts ✅
- ✅ backup.sh: Database + Gitea data backup to S3 bucket
- ✅ health-check.sh: Validates all services running and responsive
- ✅ auto-update.sh: Automatic updates for low-risk containers (nginx)
- Backup before update
- Pull new image
- Recreate container
- Health check validation
- Automatic rollback on failure
- Email notifications
- ✅ manual-update.sh: Manual updates for critical containers (gitea/postgres)
- Operator confirmation required
- Same safety flow as auto-update
- Success/failure notifications
- ✅ test-integration.sh: Comprehensive integration test suite for CI/CD
- Script syntax validation (bash -n)
- Docker Compose configuration validation
- Backup archive creation and validation
- Health check failure detection
- Update workflow with rollback simulation
- Full backup and restore cycle testing (22 assertions total)
- Isolated test environment (/tmp)
- No dependencies on live services
- ✅ restore.sh: Disaster recovery from S3 backups
- Downloads latest backups from S3
- Restores database, Gitea data, and configuration
- Service stop/start orchestration
- Tested successfully on live system (timestamp 20260611_164408)
Script Quality:
- All scripts follow DRY principles with extracted helper functions
- Consistent error handling and logging patterns
- Configurable timeouts and magic numbers replaced with constants
- Comprehensive comments and documentation headers
4.4 Cron Jobs ✅
- ✅ Weekly automatic update (nginx only): Sunday 3:15 AM
- ✅ Weekly certificate renewal: Sunday 3:30 AM
- ✅ Daily backups: 2:00 AM
- ✅ All configured via Ansible (setup-cron.yml)
4.5 Certificate Renewal ✅
- ✅ Automated weekly renewal check via cron
- ✅ Uses certbot container:
docker compose run --rm certbot renew - ✅ Restarts nginx to load new certificates
- ✅ Process is idempotent (safe to run weekly)
4.6 Testing & Validation ✅
- ✅ Integration tests created (test-integration.sh)
- ✅ All scripts tested on live system
- ✅ Cron jobs verified
- ✅ Email notifications tested
- ✅ Diun monitoring confirmed (4 containers)
- ✅ Update workflow diagram created
4.7 CI/CD Implementation ✅
- ✅ Gitea Actions enabled on instance
- ✅ Self-hosted runners deployed (2x act_runner v0.2.10)
- ✅ Runner automation via Ansible (setup-runner.yml)
- ✅ Systemd services for runner management
- ✅ Host networking configuration for job containers
- ✅ CI workflow created (.gitea/workflows/test.yml)
- ✅ Automated testing on pull requests
- ✅ Docker layer caching for performance
- ✅ Artifact upload on test failure
- ✅ Full CI/CD pipeline tested and operational
Goals:
- ✅ Automated update system operational
- ✅ Update process tested and validated on live system
- ✅ Rollback procedure implemented and tested
- ✅ Quality gate for CI/local environments
- ✅ CI/CD pipeline with self-hosted runners
- ✅ Documentation complete (workflow diagram)
Implementation Summary:
- 5 bash scripts following best practices (DRY, error handling, logging)
- Diun monitoring with AWS SES email notifications
- Per-container update policies (automatic: nginx, manual: gitea/postgres)
- Pre-update backups with automatic rollback on failure
- Certificate renewal automation
- Comprehensive testing framework
- CI/CD with Gitea Actions and 2 self-hosted runners
- Visual workflow documentation (including CI/CD flow)
Phase 4 Complete! Update automation and CI/CD fully operational with safety mechanisms.
Phase 5: Backup Strategy Implementation ✅
This phase implements comprehensive backup solutions.
5.1 Backup Concept Document ✅
- ✅ Document backup strategy (3-2-1 rule)
- ✅ Define backup scope (database, repos, config, etc.)
- ✅ Define retention policy
- ✅ Define RTO and RPO targets
5.2 Backup Implementation ✅
- ✅ Automate database backups (pg_dump)
- ✅ Automate Gitea data directory backups (tar.gz)
- ✅ Automate configuration backups (docker-compose.yml, .env, scripts)
- ✅ Set up backup storage (S3 with versioning)
- ✅ Implement backup rotation and cleanup (S3 lifecycle policy)
- ✅ Schedule automated backups (daily 2:00 AM cron)
- ✅ Pre-update backups integrated into update workflow
5.3 Recovery Testing ✅
- ✅ Document restore procedures (docs/backup-strategy.md + restore.sh script)
- ✅ Test database restore on live system (timestamp: 20260611_164408)
- ✅ Test full system restore (database + data + config)
- ✅ Verify services operational post-restore (all containers healthy)
- ✅ Document recovery time (RTO: ~45 minutes, RPO: 24 hours)
- ✅ Integration test suite includes full backup/restore cycle validation
Goals:
- ✅ Automated backup system operational
- ✅ Restore procedures tested and documented
- ✅ Backup strategy document completed (docs/backup-strategy.md - 145 lines, concise)
- ✅ Disaster recovery validated on production system
Phase 5 Complete! Backup and restore fully operational and validated.
Phase 6: Monitoring Concept 🔄
This phase documents a monitoring strategy for future implementation.
6.1 Monitoring Concept Document 🔄
- 🔄 Define key metrics to monitor (CPU, RAM, disk, network, Gitea-specific)
- 🔄 Define alerting thresholds and conditions
- 🔄 Define alert channels (email, Slack, etc.)
- 🔄 Technology selection (Prometheus + Grafana)
- 🔄 Architecture design (exporters, retention, dashboards)
- 🔄 Implementation plan and effort estimation
Goals:
- 🔄 Monitoring concept document completed (docs/monitoring-concept.md)
- 🔄 Clear roadmap for future monitoring implementation
Note: Full implementation deferred - concept document shows architectural understanding and planning.
Phase 7: Logging Concept 🔄
This phase documents a centralized logging strategy for future implementation.
7.1 Logging Concept Document 🔄
- 🔄 Define logging architecture (Loki + Promtail)
- 🔄 Define log sources (Gitea, nginx, PostgreSQL, system)
- 🔄 Define log retention policy
- 🔄 Define log analysis requirements and use cases
- 🔄 Integration with Grafana for visualization
- 🔄 Implementation plan and resource requirements
Goals:
- 🔄 Logging concept document completed (docs/logging-concept.md)
- 🔄 Clear roadmap for future logging implementation
Note: Full implementation deferred - concept document shows architectural understanding and planning.
Phase 8: High Availability Concept 🔄
This phase documents a high availability strategy for future implementation.
8.1 HA Concept Document 🔄
- 🔄 Document SPOF (Single Points of Failure) analysis
- 🔄 Design HA architecture (Multi-AZ, load balancing)
- 🔄 Database redundancy strategy (RDS Multi-AZ or PostgreSQL replication)
- 🔄 Application redundancy (multiple Gitea instances)
- 🔄 Shared storage considerations (EFS or S3 for Gitea data)
- 🔄 Load balancer configuration (ALB)
- 🔄 Define failover strategy and automation
- 🔄 Define RTO/RPO targets for HA scenario
- 🔄 Cost analysis and trade-offs
Goals:
- 🔄 HA concept document completed (docs/ha-concept.md)
- 🔄 Clear architecture for scaling to high availability
Note: Full implementation deferred - concept document shows architectural understanding and planning.
Phase 9: Documentation and Final Testing ✅
This phase consolidates all documentation and performs end-to-end testing.
9.1 Documentation ✅
- ✅ Create comprehensive README.md
- Project overview and objectives
- Architecture summary
- Prerequisites and setup instructions
- Deployment procedures
- Operational procedures
- Troubleshooting guide
- ✅ Document architecture with diagrams (4 diagrams in docs/diagrams/)
- ✅ Document all decisions (ADR.md)
- ✅ Document all procedures (deployment, updates, backup/restore)
- ✅ Backup strategy documentation (docs/backup-strategy.md - 152 lines)
- ✅ Future enhancements (monitoring, logging, HA concept docs created)
9.2 Final Testing ✅
- ✅ Perform end-to-end deployment test (make configure tested)
- ✅ Test all automated processes (updates, backups, CI/CD)
- ✅ Verify all automation is functional
- ✅ System accessible via HTTPS with production SSL
9.3 Repository Organization ✅
- ✅ Well-organized directory structure
- ✅ Clear separation of concerns (terraform, ansible, docker, scripts)
- 🔄 Comprehensive README.md
Goals:
- 🔄 Complete documentation package
- ✅ All automation tested and validated
- 🔄 Ready for interview presentation
Phase 10: Interview Preparation
This phase prepares for the interview discussion.
10.1 Preparation
- Review all concept documents
- Prepare to explain technology choices
- Prepare architecture diagrams for presentation
- Prepare to demonstrate the system
- List lessons learned and trade-offs made
- Prepare improvement suggestions
Goals:
- Ready to discuss all aspects of the implementation
- Demo environment functional and accessible
- Confident in technology choices and concepts
Success Criteria
- ✅ Gitea accessible via HTTPS through reverse proxy (production SSL)
- ✅ Installation fully automated and reproducible (Terraform + Ansible)
- ✅ Automated updates configured and tested (Diun + custom scripts)
- ✅ CI/CD pipeline operational (Gitea Actions with self-hosted runners)
- ✅ Automated backups implemented (daily to S3)
- 🔄 Comprehensive concept documents for: Backup, Monitoring, Logging, HA
- ✅ All code in version control with proper structure
- ✅ System accessible to interviewer over internet (https://git.poll-streams.com)
- 🔄 Complete README.md with deployment and operational procedures
Current Status: Production-ready system with comprehensive automation. Completing final documentation phase before interview.
Remaining Work (Phase 9 Completion)
Documentation Tasks
-
README.md - Comprehensive project documentation
- Overview and objectives
- Architecture summary with diagram references
- Prerequisites and deployment guide
- Operational procedures (updates, backups, troubleshooting)
-
docs/backup-strategy.md - Complete backup documentation
- 3-2-1 backup strategy
- RTO/RPO targets
- Backup scope and retention policy
- Restore procedures with step-by-step instructions
- S3 lifecycle policy for rotation
- Configuration backup automation
-
docs/monitoring-concept.md - Future monitoring architecture
- Prometheus + Grafana architecture
- Key metrics and alerting thresholds
- Implementation plan
-
docs/logging-concept.md - Future logging architecture
- Loki + Promtail architecture
- Log sources and retention
- Implementation plan
-
docs/ha-concept.md - High availability design
- SPOF analysis
- Multi-AZ architecture with load balancing
- Database replication strategy
- Cost/benefit analysis
Estimated Completion: 2-3 hours