qvest-task/docs/backup-strategy.md
gitea_admin 2e368a3a7c feat: implement disaster recovery with automated restore (#2)
- Create restore.sh for automated S3 backup recovery
  - Fetches backups, stops services, restores database/data/config, restarts & validates
- Successfully tested on production system
- Document procedures in backup-strategy.md
- Add Test 6: Full backup/restore cycle with disaster simulation
- Rename test-update.sh → test-integration.sh

Co-authored-by: aviyadeveloper <aviya.developer@gmail.com>
Reviewed-on: #2
2026-06-11 17:29:55 +00:00

88 lines
2.5 KiB
Markdown

# Backup Strategy
## Overview
Implements the **3-2-1 rule**: 3 copies of data, on 2 different storage types, with 1 offsite.
| Copy | Location | Type | Retention |
|------|----------|------|-----------|
| 1 | EC2 (EBS) | Block Storage | Live |
| 2 | S3 Standard | Object Storage | 30 days |
| 3 | S3 Glacier | Cold Storage | 90 days |
## What is Backed Up
1. **PostgreSQL Database** (`database-*.sql.gz`) - All application data, users, repos metadata
2. **Gitea Data** (`gitea-data-*.tar.gz`) - Git repositories, LFS objects, attachments, SSH keys
3. **Configuration** (`config-*.tar.gz`) - docker-compose.yml, nginx configs, .env, scripts
## Backup Schedule
| Type | Frequency | Time | Script |
|------|-----------|------|--------|
| Automated | Daily | 02:00 UTC | `/opt/gitea/scripts/backup.sh` |
| Pre-Update | Before updates | Variable | Called by update scripts |
| Manual | On-demand | N/A | Run backup.sh manually |
**Location**: `s3://qvest-task-backups/backups/`
## Retention & Lifecycle
```
Day 1-30: S3 Standard (instant access)
Day 31-90: S3 Glacier (retrieval: minutes to hours)
Day 90+: Automatically deleted
```
Managed by Terraform (`terraform/storage.tf`). S3 versioning enabled with 30-day noncurrent version expiration.
## Restore Procedures
### Quick Restore
```bash
# List available backups
sudo /opt/gitea/scripts/restore.sh
# Restore specific backup
sudo /opt/gitea/scripts/restore.sh <timestamp>
# Example: sudo /opt/gitea/scripts/restore.sh 20260611_164408
```
The script will:
1. Prompt for confirmation
2. Download backups from S3
3. Stop services
4. Restore database, data, and configuration
5. Restart and verify services
## Disaster Recovery Scenarios
### Database Corruption
**Solution**: Database-only restore
### Repository Deletion
**Solution**: Full restore (database + data must match)
### Complete Instance Failure
**Solution**: Rebuild infrastructure + restore
**Steps**:
1. `terraform apply`
2. `ansible-playbook site.yml`
3. `restore.sh`
4. Update DNS if needed
## Security
- **Encryption**: S3 server-side AES-256 encryption enabled
- **Access**: EC2 IAM role with S3FullAccess (consider tightening to bucket-specific)
- **Data Sensitivity**: Backups contain passwords, SSH keys, API tokens - restrict S3 bucket access
⚠️ **Note**: `.env` file with secrets is included in config backups. Secure S3 bucket appropriately.
## Document History
| Version | Date | Changes |
|---------|------|---------|
| 1.0 | 2026-06-11 | Initial backup strategy |