[Remote] Principal Engineer, Compute Fleet Management

Remote Full-time
Note: The job is a remote job and is open to candidates in USA. Databricks is a data and AI company focused on enabling data teams to solve complex problems through their infrastructure platform. The Principal Engineer for Compute Fleet Management will lead efforts in optimizing compute resources across major cloud platforms, ensuring high availability and efficiency.

Responsibilities
β€’ Pioneering Fleet Optimization: Provisioning and pooling of O(Billion)s of cloud resources to achieve peak workload performance, industry-leading efficiency, and robust resource isolation
β€’ Delivering Hyper-Scale Resilience: Build the architecture that guarantees horizontal scaling and resilience against zonal or even cloud account-level failures, ensuring Databricks is always on
β€’ Owning the Critical Path: Lead the development of the lowest-dependency systems required to bootstrap and manage our massive compute platform
β€’ High Availability: Achieve and maintain 99.99% availability for all batch and serving workloads
β€’ Stellar Efficiency: Drive utilization to 60% or higherβ€”a crucial metric that requires balancing high efficiency with unwavering tolerance for cloud failures
β€’ Best-in-Class Isolation: Architect and enforce strong security and performance isolation across a diverse range of customer workloads
β€’ Leading Transformative Projects: Taking ownership of complex, cross-team, cross-layer, and multi-quarter strategic engineering initiatives from concept to execution
β€’ Distributed Systems Mastery: Deep, hands-on experience developing and operating high-scale distributed systems on at least one major public cloud
β€’ Influence Without Authority: Proven ability to drive consensus, establish technical direction, and lead large technical efforts across organizational boundaries
β€’ Execution Discipline: Exceptional strength in planning, tracking project progress, and managing complex cross-organizational dependencies

Skills
β€’ Leading Transformative Projects: Taking ownership of complex, cross-team, cross-layer, and multi-quarter strategic engineering initiatives from concept to execution
β€’ Distributed Systems Mastery: Deep, hands-on experience developing and operating high-scale distributed systems on at least one major public cloud
β€’ Influence Without Authority: Proven ability to drive consensus, establish technical direction, and lead large technical efforts across organizational boundaries
β€’ Execution Discipline: Exceptional strength in planning, tracking project progress, and managing complex cross-organizational dependencies
β€’ Experience managing and scaling a massive fleet of GPUs for AI/ML workloads
β€’ Experience with developing and operating large-scale distributed systems across all major clouds (AWS, Azure, and GCP)

Benefits
β€’ Eligibility for annual performance bonus
β€’ Equity
β€’ Comprehensive benefits and perks

Company Overview
β€’ Databricks is a data and AI platform that unifies data engineering, analytics, and machine learning on a lakehouse architecture. It was founded in 2013, and is headquartered in San Francisco, California, USA, with a workforce of 5001-10000 employees. Its website is https://www.databricks.com.

Company H1B Sponsorship
β€’ Databricks has a track record of offering H1B sponsorships, with 385 in 2025, 319 in 2024, 227 in 2023, 222 in 2022, 166 in 2021, 64 in 2020. Please note that this does not guarantee sponsorship for this specific role.

Apply tot his job

Apply To this Job
Apply Now

Similar Opportunities

Experienced Registered Behavior Technician for In-Home ABA Therapy - Atlanta, GA

Remote Full-time

Immediate Hiring: Experienced Registered Behavioral Technician (RBT) for Clinic-Based ABA Therapy Services

Remote Full-time

Experienced Registered Behavioral Technician (RBT) - ABA Therapy for Children with Autism Spectrum Disorder

Remote Full-time

Experienced Registered Nurse - Telehealth: Providing Remote Care Coordination and Patient Support

Remote Full-time

Experienced Substitute Teacher for Riverside County Schools - Join Scoot Education's Innovative Team

Remote Full-time

Experienced Substitute Teacher for San Bernardino County - Flexible Schedules & Competitive Pay

Remote Full-time

Experienced School Year Instructional Coach for High-Dosage Tutoring Programs in Edgewater Park, NJ

Remote Full-time

Experienced School Year Tutor for K-8 Students in Math and Literacy - Mickleton, NJ

Remote Full-time

Experienced Secondary Social Studies Teacher for Kansas - Flexible Hybrid Remote Arrangement

Remote Full-time

USPS Office Helper

Remote Full-time

Experienced Pharmacy Technician, Clinical Customer Care – Remote Opportunity for Compassionate and Detail-Oriented Professionals to Deliver Exceptional Patient Support

Remote Full-time

**Experienced Data Entry Specialist – Remote Opportunity at arenaflex**

Remote Full-time

Part Time Remote Data Entry Job (UPS Part Time) – Hiring Now – Amazon Store

Remote Full-time

**Experienced Remote Data Entry Specialist – Unlocking the Magic of Disney's Digital Universe**

Remote Full-time

**Experienced Customer Service Assistant – Houston, TX**

Remote Full-time

Business Analyst ( Remote ) ( Pharmacy / Healthcare )

Remote Full-time

Gebietsleitung (m/w/d) Deutschland

Remote Full-time

**Experienced Full Stack Data Analyst – Transportation Examination Group**

Remote Full-time

Licensed Mental Health Therapist - Telehealth job at Reema Health in Kansas City, MO

Remote Full-time

**Experienced Customer Service Representative – Remote Work Opportunity with blithequark**

Remote Full-time
← Back to Home