Senior Machine Learning Systems Engineer (Training Optimization)

Remote Full-time
Company Description: About the Group/Team We're the CORE team within the Generative AI supergroup. Our mission is to invent foundational technologies that will power the future of AI-assisted design. From large-scale models to groundbreaking research, our team builds the technical core of Canva’s creative intelligence engine. We collaborate globally to ship research that makes a real impact—from smart editing to AI video tools—at massive scale. Job Description: About the Role/Specialty As a Senior Machine Learning Systems Engineer, you’ll lead efforts to scale and optimize the training system for our large-scale multimodal and foundation models. You’ll design distributed training systems using Megatron-LM, NVIDIA NeMo, FSDP, and Triton—pushing the limits of performance across compute, memory, and communication layers. You'll sit at the intersection of systems and AI research, directly shaping how we train the models that will power Canva’s next generation of products. What you’ll do (responsibilities) You’ll design, implement, and optimize large-scale machine learning systems for training You’ll improve all aspects of performance, including GPU utilization, communication overhead, and memory efficiency. You’ll partner with research and modeling teams to align systems with algorithmic needs. You’ll evaluate and apply best practices for distributed training using industry-leading frameworks. You’ll dive deep into low-level optimization, including custom CUDA or Triton kernels. You’ll debug, profile, and fine-tune training workflows to unlock new levels of scalability. Qualifications: What we're looking for We’re looking for a systems-first engineer who thrives in fast-paced, high-impact environments. You’re deeply familiar with distributed model training at scale and understand the nuances of optimizing compute at every level of the stack. You're excited by challenges that stretch current boundaries, and you’re a strong collaborator who communicates clearly across domains. Strong background in LLMs, multimodal AI, or diffusion models. Proficiency in Python. Familiarity with a system programming language (e.g. C++ or Rust) is a plus. Deep knowledge of PyTorch or JAX as well as libraries such as Megatron-LM, NeMo, or DeepSpeed. Familiarity with common optimization techniques such as FSDP/ZeRO, gradient checkpointing, or low-precision data types. Hands-on experience writing custom GPU kernels in CUDA or Triton. Excellent communication and problem-solving skills, incl. full proficiency in English.
Apply Now

Similar Opportunities

Experienced Registered Behavior Technician for In-Home ABA Therapy - Atlanta, GA

Remote Full-time

Immediate Hiring: Experienced Registered Behavioral Technician (RBT) for Clinic-Based ABA Therapy Services

Remote Full-time

Experienced Registered Behavioral Technician (RBT) - ABA Therapy for Children with Autism Spectrum Disorder

Remote Full-time

Experienced Registered Nurse - Telehealth: Providing Remote Care Coordination and Patient Support

Remote Full-time

Experienced Substitute Teacher for Riverside County Schools - Join Scoot Education's Innovative Team

Remote Full-time

Experienced Substitute Teacher for San Bernardino County - Flexible Schedules & Competitive Pay

Remote Full-time

Experienced School Year Instructional Coach for High-Dosage Tutoring Programs in Edgewater Park, NJ

Remote Full-time

Experienced School Year Tutor for K-8 Students in Math and Literacy - Mickleton, NJ

Remote Full-time

Experienced Secondary Social Studies Teacher for Kansas - Flexible Hybrid Remote Arrangement

Remote Full-time

USPS Office Helper

Remote Full-time

Project Associate, Clinical Operations

Remote Full-time

**Experienced Data Entry Clerk – Remote Opportunity with arenaflex**

Remote Full-time

Shift Supervisor - Store# 19548, HWY 82 & WILLITS LANE - Lead with Passion and Deliver Exceptional Customer Experience at Starbucks

Remote Full-time

Experienced Senior Customer Success Manager – Remote Opportunity for Delivering Exceptional Client Satisfaction and Driving Business Growth through Strategic Partnerships

Remote Full-time

[Remote] Project Lynx - Quality Reviewer (Spoken Content) - English (New Zealand)

Remote Full-time

Practice Test Content Editor, History/Social Studies (Contract)

Remote Full-time

Experienced Full Stack Software Engineer – Disney Entertainment Association – Remote Work Opportunity with Competitive Salary and Benefits

Remote Full-time

Public Finance Investment Banking Analyst

Remote Full-time

Experienced Remote Customer Service Representative for Disney - Part-Time Opportunity with Flexible Scheduling

Remote Full-time

Remote Non-Credentialed Veterinary Support Staff

Remote Full-time
← Back to Home