Machine Learning Engineer - ML Training Platform

Remote Full-time
Overview Pluralis Research is pioneering Protocol Learning —a fully decentralised way to train and deploy AI models that opens this layer to individuals rather than well resourced corporates. By pooling compute from many participants, incentivising their efforts, and preventing any single party from controlling a model’s full weights, we’re creating a genuinely open, collaborative path to frontier-scale AI. We’re looking for an ML Training Platform Engineer to architect, build, and scale the foundational infrastructure powering our decentralized ML training platform. You will own core systems spanning infrastructure orchestration, distributed compute, and services integration, enabling continuous experimentation and large-scale model training. Responsibilities Multi-Cloud Infrastructure : Design resource management systems provisioning and orchestrating compute across AWS, GCP, and Azure using infrastructure-as-code (Pulumi/Terraform). Handle dynamic scaling, state synchronization, and concurrent operations across hundreds of heterogeneous nodes. Distributed Training Systems : Architect fault-tolerant infrastructure for distributed ML. GPU clusters, NVIDIA runtime, S3 checkpointing, Large dataset management and streaming, health monitoring, and resilient retry strategies. Real-World Networking : Build systems that simulate and handle real-world network conditions — bandwidth shaping, latency injection, packet loss — while managing dynamic node churn and ensuring efficient data flow across workers with heterogeneous connectivity, because our training happens on consumer nodes and non co-located infrastructure, not in a datacenter. What You’ll Bring Ideally, you’ll have 5+ years of work experience with deep experience in: Infrastructure & Platform Engineering: Production experience with infrastructure-as-code (Pulumi/Terraform/CloudFormation) managing multi-cloud deployments, lifecycle orchestration, self-healing systems, Docker/Kubernetes (EKS), GPU workloads, and heterogeneous clusters at scale. Distributed Systems & ML Infrastructure: Deep understanding of distributed training workflows, checkpointing, data sharding, model versioning, long-running job orchestration, decentralized networking (P2P, NAT traversal, traffic shaping), and real-world bandwidth constraints. Systems Programming & Reliability: Strong Python engineering (asyncio, concurrency, retry logic, cloud SDKs, CLI tooling) with hands-on experience in observability, SRE practices, monitoring (Prometheus/Grafana), performance profiling, and incident response. What we’re looking for Experience in a startup environment with an emphasis on micro-services orchestration or big tech background Deep understanding of multi-cloud infra & distributed training systems A team player with high attention to detail A strong passion to join Backed by Union Square Ventures and other tier-1 investors, we’re a world-class, deeply technical team of ML researchers. Pluralis is unapologetically ideological. We view the world as a better place if we are able to implement what we are attempting, and Protocol Learning as the only plausible approach to preventing a handful of massive corporations monopolising model development, access and release, and achieving massive economic capture. If this resonates, please apply.
Apply Now

Similar Opportunities

Experienced Registered Behavior Technician for In-Home ABA Therapy - Atlanta, GA

Remote Full-time

Immediate Hiring: Experienced Registered Behavioral Technician (RBT) for Clinic-Based ABA Therapy Services

Remote Full-time

Experienced Registered Behavioral Technician (RBT) - ABA Therapy for Children with Autism Spectrum Disorder

Remote Full-time

Experienced Registered Nurse - Telehealth: Providing Remote Care Coordination and Patient Support

Remote Full-time

Experienced Substitute Teacher for Riverside County Schools - Join Scoot Education's Innovative Team

Remote Full-time

Experienced Substitute Teacher for San Bernardino County - Flexible Schedules & Competitive Pay

Remote Full-time

Experienced School Year Instructional Coach for High-Dosage Tutoring Programs in Edgewater Park, NJ

Remote Full-time

Experienced School Year Tutor for K-8 Students in Math and Literacy - Mickleton, NJ

Remote Full-time

Experienced Secondary Social Studies Teacher for Kansas - Flexible Hybrid Remote Arrangement

Remote Full-time

USPS Office Helper

Remote Full-time

Career Transition & Lifestyle Strategist – Remote / Self-Employed

Remote Full-time

Field Service Representative

Remote Full-time

Operations Manager, PCP (Peru, Colombia & Porto Rico)

Remote Full-time

Client Success Specialist | Remote | Flexible

Remote Full-time

100% Remote Golang Developer with Devops/LLM exp. W2 Consultant

Remote Full-time

**Experienced Part-Time Data Entry Claims Intake Processor – Remote Opportunity with arenaflex**

Remote Full-time

Senior Analyst – JET Blue Airlines Remote Data Entry & Busi – Amazon Store

Remote Full-time

RFP Quality Assurance Writer/Editor – US Remote in Minnetonka, MN

Remote Full-time

Emergency Department Multifunction Risk Sitter PRN

Remote Full-time

User Experience Research

Remote Full-time
← Back to Home