[Remote] AI Infrastructure Engineer at Hydra Host

Remote Full-time
Note: The job is a remote job and is open to candidates in USA. Hydra Host is a Founders Fund–backed NVIDIA cloud partner building the infrastructure platform that powers AI at scale. As an AI Infrastructure Engineer, you will work directly with AI platform customers to optimize their infrastructure on Hydra, focusing on Kubernetes clusters, GPU configurations, and automating the onboarding process.

Responsibilities
• Get AI Platform customers production-ready on Hydra —standing up Kubernetes clusters, configuring GPU drivers, validating networking, and troubleshooting the issues that surface when real workloads hit real hardware
• Own the bare metal ←→ platform layer —bridging GPU infrastructure (NCCL, InfiniBand, NVLink, storage) with orchestration layers (Kubernetes, SLURM) and MLOps tooling that customers actually use
• Configure, benchmark, and debug NVIDIA driver stacks —firmware versions, CUDA compatibility, NCCL tuning, MIG configurations. Run quality benchmarks and diagnostics to validate performance for inference and training workloads across chip types
• Identify gaps before customers do —pressure-testing Hydra's infrastructure, APIs, and workflows to find what's missing or broken
• Turn customer learnings into product —working with Product and Engineering to build reusable templates, default configurations, and automated workflows that eliminate manual onboarding
• Advise customers on chip selection and tokenomics —helping AI platform customers understand price/performance trade-offs across GPU types, cost-per-token economics, and which hardware fits their inference or training workloads

Skills
• Bare metal Linux depth — you've administered GPU servers at the metal: driver stacks, kernel tuning, firmware, storage configuration. Not just managed K8s
• NVIDIA GPU stack expertise — drivers, CUDA, NCCL, NVLink, nvidia-smi profiling. You understand how stack compatibility affects performance
• Kubernetes and orchestration — production experience with K8s, SLURM, or similar. You know how to stand up clusters, not just deploy to them
• AI Networking fundamentals — TCP/IP, VLANs, bonding, and high-speed interconnects (InfiniBand, RoCE) for distributed workloads
• Customer-facing communication — you can work directly with engineers at AI platform companies, understand their constraints, and translate that into clear requirements for your team
• Bias toward scalable solutions — you'd rather build a feature that helps 10 customers than a custom deployment that helps 1
• HPC or large-scale distributed training environments
• AI workload experience (vLLM, PyTorch, inference frameworks)
• Storage systems (NVMe, distributed filesystems, CEPH, WEKA)
• IaC and provisioning tools (Terraform, Ansible, Cloud-init, MaaS)

Benefits
• Equity ownership — meaningful stake in what we're building
• Healthcare — medical, dental, vision for you and your family
• Remote-first — with hubs in Phoenix, Boulder, and Miami
• Direct impact — your work shapes how GPU infrastructure gets deployed across the AI ecosystem

Company Overview
• Hydra offers a bare metal GPU platform, connecting businesses to a vareity of independent but standardized AI Factory Franchises. It was founded in 2021, and is headquartered in Miami, Florida, USA, with a workforce of 11-50 employees. Its website is https://www.hydrahost.com.

Apply Now

Apply Now
Apply Now

Similar Opportunities

Experienced Registered Behavior Technician for In-Home ABA Therapy - Atlanta, GA

Remote Full-time

Immediate Hiring: Experienced Registered Behavioral Technician (RBT) for Clinic-Based ABA Therapy Services

Remote Full-time

Experienced Registered Behavioral Technician (RBT) - ABA Therapy for Children with Autism Spectrum Disorder

Remote Full-time

Experienced Registered Nurse - Telehealth: Providing Remote Care Coordination and Patient Support

Remote Full-time

Experienced Substitute Teacher for Riverside County Schools - Join Scoot Education's Innovative Team

Remote Full-time

Experienced Substitute Teacher for San Bernardino County - Flexible Schedules & Competitive Pay

Remote Full-time

Experienced School Year Instructional Coach for High-Dosage Tutoring Programs in Edgewater Park, NJ

Remote Full-time

Experienced School Year Tutor for K-8 Students in Math and Literacy - Mickleton, NJ

Remote Full-time

Experienced Secondary Social Studies Teacher for Kansas - Flexible Hybrid Remote Arrangement

Remote Full-time

USPS Office Helper

Remote Full-time

**Experienced Remote Chat Support Specialist – Delivering Exceptional Customer Service in a Dynamic Healthcare Environment**

Remote Full-time

Technical Writer – Telecom and Cyber Policy Evaluation Project (Short-Term Consultant)

Remote Full-time

Sr Therapeutic Specialist, Oncology

Remote Full-time

Experienced Claims Customer Service Representative – Remote Opportunity for Career Growth and Development in the Insurance Industry at arenaflex

Remote Full-time

Senior Lender

Remote Full-time

Cloud & Virtualization Network Engineer

Remote Full-time

研究员,生物制剂和疫苗配方

Remote Full-time

Senior Consultant - SAP S/4HANA Logistics

Remote Full-time

Netflix Careers | Latest Movie Reviewer Job Work From Home | Jobsarabi.com

Remote Full-time

**Experienced Full Stack Data Entry Operator – Remote Work Opportunity with arenaflex**

Remote Full-time
← Back to Home