Hey, great topic!
For our setup (Docker + MariaDB), here’s what’s been helpful:
Monit to watch for crashed containers or resource spikes
UFW + Fail2ban for basic security and brute-force prevention
Telegraf + Grafana for CPU, RAM, and user load monitoring
Attune to script regular backup jobs, and run updates across containers
Rclone to sync TS backups to AWS S3 automatically
Ansible for provisioning the whole stack across test/dev
I’d also recommend scheduling restarts during off-hours. We do it via cron, depending on the environment.