Head of Site Reliability Engineering

Remote Full-time
We’re an award-winning global outsourcer providing contact center and back office services on behalf of our global clients. Come work at a place where innovation and teamwork come together to support the most exciting missions in the world! Role objective The Head of Site Reliability Engineering is a hybrid technical‑leadership role. You will: • Own reliability of production services running on AWS while steering the roadmap for platform resilience and building out the SRE team. • Lead and grow a remote team of SREs—coaching, hiring, performance‑managing, and fostering a blameless culture. • Set and enforce Service Level Objectives (SLOs), error budgets, and incident response processes. • Drive automation via Infrastructure‑as‑Code (Pulumi / TypeScript), CI/CD, and observability pipelines. • Represent the SRE discipline to product, engineering, and senior leadership across our global business. • Hands on monitoring and incident response will be critical as the team grows. This role offers the opportunity to build reliability engineering from the ground up in a mission-critical IoT platform. Key Responsibilities Leadership & People Management • Build an SRE team of initially 3-6 engineers: goal setting, career development, regular 1:1s, and annual performance reviews. • Ensure operational system knowledge is captured and that the team is kept "fresh" on operating and troubleshooting procedures. • Recruit, onboard, and mentor new engineers; scale the team to meet business growth. • Maintain an inclusive, psychologically‑safe culture centred on learning and continuous improvement. • Own, and participate in, the on‑call roster for the team, ensuring equitable rotations and sustainable workloads. Service Level Management & Reliability • Define, monitor, and enforce SLOs and error budgets across all production systems. • Continuously analyse error‑budget burn to halt risky deployments and guide capacity decisions. • Champion a data‑driven reliability mindset throughout engineering and product teams. Infrastructure Automation & Management • Architect and implement Infrastructure‑as‑Code in Pulumi/TypeScript for AWS resources (EKS, MSK, SingleStore, MongoDB, S3, etc.). • Lead large‑scale migration or modernisation projects (e.g., Kubernetes upgrades, multi‑AZ resilience). • Eliminate toil—any manual task >2 engineer‑days/quarter or frequently repeated becomes an automation candidate. Incident Response & Post‑Mortem Leadership • Participate in on-call monitoring and response roster. • Serve as escalation point and incident commander. • Ensure post‑mortems are published within 48 hours with actionable “never again” tasks tracked to closure. • Improve runbooks and game‑day exercises; train engineers on incident command principles. Security & Compliance • Enforce least‑privilege IAM policies and champion DevSecOps practices. • Contribute to SOC 2 & ISO 27001 evidence collection and continuous control monitoring. • Oversee security patch pipelines, vulnerability management, and secrets hygiene. Operational Excellence & Continuous Improvement • Own reliability KPIs (MTTR, change failure rate, meantime between failures). • Lead quarterly reliability reviews and drive the reliability roadmap. • Partner with Product on capacity forecasts and cost‑optimisation initiatives. Join the A-Team and experience the A-Life!
Apply Now

Similar Opportunities

Experienced Registered Behavior Technician for In-Home ABA Therapy - Atlanta, GA

Remote Full-time

Immediate Hiring: Experienced Registered Behavioral Technician (RBT) for Clinic-Based ABA Therapy Services

Remote Full-time

Experienced Registered Behavioral Technician (RBT) - ABA Therapy for Children with Autism Spectrum Disorder

Remote Full-time

Experienced Registered Nurse - Telehealth: Providing Remote Care Coordination and Patient Support

Remote Full-time

Experienced Substitute Teacher for Riverside County Schools - Join Scoot Education's Innovative Team

Remote Full-time

Experienced Substitute Teacher for San Bernardino County - Flexible Schedules & Competitive Pay

Remote Full-time

Experienced School Year Instructional Coach for High-Dosage Tutoring Programs in Edgewater Park, NJ

Remote Full-time

Experienced School Year Tutor for K-8 Students in Math and Literacy - Mickleton, NJ

Remote Full-time

Experienced Secondary Social Studies Teacher for Kansas - Flexible Hybrid Remote Arrangement

Remote Full-time

USPS Office Helper

Remote Full-time

**Experienced Live Chat Agent – Customer Support and Service Excellence at arenaflex**

Remote Full-time

Remote Student Customer Care Representative - Flexible Hours, International Team, and Career Growth Opportunities at LimeSurvey

Remote Full-time

Compliance Officer - Consulting Services

Remote Full-time

**Experienced Customer Service Representative – Pet Parent Support Specialist at arenaflex**

Remote Full-time

Full Stack Engineer - Backend (Ruby on Rails/React)

Remote Full-time

Claims Review Representative

Remote Full-time

Senior Clinical Coding Auditor & Trainer (RN Required), Anywhere NY

Remote Full-time

Surgical Charge Nurse Auditor

Remote Full-time

Generative AI Principal Consultant

Remote Full-time

Executive Assistant – 100% Remote – Philippines (US Clients)

Remote Full-time
← Back to Home