Manager, Site Reliability Engineering - Cloud Platform

Remote Full-time
Hybrid: This role is categorized as hybrid. This means the successful candidate is expected to report to Mountain View, CA, Austin, TX, or Warren, MI three times per week, at minimum.

The rapid adoption of advanced software in vehicles marks a new era for automakers and consumers, bringing both advantages and challenges.

As part of Site Reliability Engineering (SRE) at General Motors, you'll join a dedicated team focused on enhancing the reliability, efficiency, and scalability of our distributed systems. We leverage engineering principles to manage operations effectively and build solutions that enable us to grow without sacrificing performance or quality. Our SREs work closely with software development teams, acting as specialists in reliability and production engineering, with a focus on automation, observability, and shared responsibility.

We are looking for individuals who are passionate about maintaining the health of our infrastructure while optimising for reliability and cost-efficiency. This role involves a blend of software engineering and systems engineering skills to keep our services resilient, robust, and scalable.

This is an Engineering Manager role. As an SRE Engineering Manager, you will be expected to not only lead your team in setting priorities and ensuring alignment with organizational goals but also to be deeply technical. We expect our managers to be able to contribute directly through coding, reviewing code, and mentoring engineers.

While it's unlikely that you'll spend the majority of your time coding, having the capability and willingness to dive into technical details, solve problems hands-on, and support your team's technical decisions is crucial. You'll be a mentor, guide, and a partner, helping engineers grow, and ensuring the reliability and efficiency of the systems they are working on. We believe in setting a high bar for engineering managers who can lead by example in both technical expertise and people leadership.

As part of the Cloud Platform Team you will help to build and a run self-service, multi-cloud developer platform capable of supporting thousands of services and hundreds of engineering teams, with a focus on reliability and cost efficiency. Our customers are other engineering teams at GM, and our goal is to provide them with self-service capabilities that create and manage public cloud infrastructure that is reliable, resilient to failures, easily monitored and observed, and is cost effective.

Key Responsibilities
• Automation and Reliability Improvements: Develop tools and software to automate operational processes, improve system reliability, and reduce manual intervention.
• Observability and Monitoring: Lead, Implement and improve monitoring and observability frameworks, enabling proactive detection and resolution of incidents.
• Incident Response: Participate in an on-call rotation to diagnose, troubleshoot, and mitigate production incidents, ensuring minimal downtime and swift resolution.
• Collaboration with Development Teams: Work alongside developers to ensure the quality, scalability, and reliability of our services. Practice shared ownership of services in production, fostering a "You build it, you run it" culture.
• Service Level Management: Manage Service Level Indicators (SLIs), Service Level Objectives (SLOs), and Service Level Agreements (SLAs) to manage reliability expectations effectively.
• Engineering for Reliability: Strong understanding of common application reliability patterns, with hands-on experience implementing them.
• Failure Analysis and Post-Incident Reviews: Conduct deep-dive analyses of incidents and collaborate on post-incident reviews to derive learnings and prevent recurrence. Champion a culture of continuous improvement.
• Cost Efficiency: Evaluate system performance and advocate for optimisations that reduce infrastructure costs while maintaining service reliability.

Apply Now

Apply Now

Similar Opportunities

Experienced Registered Behavior Technician for In-Home ABA Therapy - Atlanta, GA

Remote Full-time

Immediate Hiring: Experienced Registered Behavioral Technician (RBT) for Clinic-Based ABA Therapy Services

Remote Full-time

Experienced Registered Behavioral Technician (RBT) - ABA Therapy for Children with Autism Spectrum Disorder

Remote Full-time

Experienced Registered Nurse - Telehealth: Providing Remote Care Coordination and Patient Support

Remote Full-time

Experienced Substitute Teacher for Riverside County Schools - Join Scoot Education's Innovative Team

Remote Full-time

Experienced Substitute Teacher for San Bernardino County - Flexible Schedules & Competitive Pay

Remote Full-time

Experienced School Year Instructional Coach for High-Dosage Tutoring Programs in Edgewater Park, NJ

Remote Full-time

Experienced School Year Tutor for K-8 Students in Math and Literacy - Mickleton, NJ

Remote Full-time

Experienced Secondary Social Studies Teacher for Kansas - Flexible Hybrid Remote Arrangement

Remote Full-time

USPS Office Helper

Remote Full-time

[PART_TIME Remote] Entry-Level Administrative Assistant

Remote Full-time

Remote - Tax Analyst

Remote Full-time

Account Sales Specialist

Remote Full-time

RN Outpatient Surgery - Baptist in New Orleans, LA

Remote Full-time

Experienced Customer Service Representative - Remote - Delivering Exceptional Experiences with Delta Airlines

Remote Full-time

[Remote/WFM] Immediately Need Teacher in Coral Gables, FL

Remote Full-time

Experienced Remote Administrative Assistant and Data Entry Specialist – Flexible Work from Home Opportunity with arenaflex

Remote Full-time

Remote Freelance Norwegian Instructor

Remote Full-time

Senior Network Engineer, USA Remote (North Carolina or South Carolina)

Remote Full-time

Medical Science Liaison Cardiometabolic

Remote Full-time
← Back to Home