Senior High-Performance LLM Training Engineer

Remote Full-time
About the position We are now looking for a Senior High-Performance LLM Training Engineer! NVIDIA is seeking experienced engineers specializing in performance analysis and optimization to improve the efficiency of LLM training workloads, which are shaping the world's most advanced computing systems. This position focuses on optimizing NVIDIA’s high-performance LLM software stack in frameworks like PyTorch and JAX for high-performance training on thousands of GPUs, while also helping shape hardware roadmaps for the next generation of GPUs powering the AI revolution. GPU computing is the most productive and pervasive platform for deep learning and AI. It begins with the most advanced GPUs and the systems and software we build on top of them. We integrate and optimize every deep learning framework. We work with the major systems companies and every major cloud service provider to make GPUs available in data centers and in the cloud. We craft computers and software to bring AI to edge devices, such as self-driving cars and autonomous robots. AI has the potential to spur a wave of social progress unmatched since the industrial revolution. Widely considered to be one of tech's most desirable employers, NVIDIA offers highly competitive salaries and a comprehensive benefits package. Additionally, this opportunity offers you the ability to collaborate with some of the most forward-thinking and hard-working people in the world, shaping the future of AI in a creative and autonomous work environment that encourages innovation. If you're excited to work across the full hardware & software stack—from GPU architecture to application code—to achieve optimal performance, we want to hear from you! Responsibilities • Understand, analyze, profile, and optimize AI training workloads on innovative hardware and software platforms. • Understand the big picture of training performance on GPUs, prioritizing and then solving problems across all state-of-the-art neural networks. • Implement production-quality software in multiple layers of NVIDIA's deep learning platform stack, from drivers to DL frameworks. • Build and support NVIDIA submissions to the MLPerf Training benchmark suite. • Implement key DL training workloads in NVIDIA's proprietary processor and system simulators to enable future architecture studies. • Build tools to automate workload analysis, workload optimization, and other critical workflows. Requirements • PhD in Computer Science, Electrical Engineering or Computer Engineering and 5+ years; or MS (or equivalent experience) and 8+ years of meaningful work experience. • Strong background in deep learning and neural networks, in particular training. • A deep background in computer architecture and familiarity with the fundamentals of GPU architecture. • Proven experience analyzing and tuning application performance & processor and system-level performance modelling. • Programming skills in C++, Python, and CUDA.
Apply Now

Similar Opportunities

Experienced Registered Behavior Technician for In-Home ABA Therapy - Atlanta, GA

Remote Full-time

Immediate Hiring: Experienced Registered Behavioral Technician (RBT) for Clinic-Based ABA Therapy Services

Remote Full-time

Experienced Registered Behavioral Technician (RBT) - ABA Therapy for Children with Autism Spectrum Disorder

Remote Full-time

Experienced Registered Nurse - Telehealth: Providing Remote Care Coordination and Patient Support

Remote Full-time

Experienced Substitute Teacher for Riverside County Schools - Join Scoot Education's Innovative Team

Remote Full-time

Experienced Substitute Teacher for San Bernardino County - Flexible Schedules & Competitive Pay

Remote Full-time

Experienced School Year Instructional Coach for High-Dosage Tutoring Programs in Edgewater Park, NJ

Remote Full-time

Experienced School Year Tutor for K-8 Students in Math and Literacy - Mickleton, NJ

Remote Full-time

Experienced Secondary Social Studies Teacher for Kansas - Flexible Hybrid Remote Arrangement

Remote Full-time

USPS Office Helper

Remote Full-time

Senior Clinical Coding Auditor & Trainer-Remote

Remote Full-time

On Call Dental Hygienist - Olympia

Remote Full-time

Creative Content Writer

Remote Full-time

**Experienced Part-Time Evening Remote Data Entry Specialist – Join arenaflex's Dynamic Team**

Remote Full-time

Hedge Fund Client Support Analyst Chicago Office

Remote Full-time

Sr. Field Service Technician - Richmond, VA

Remote Full-time

Real Estate, Rental and Leasing Specialist - Freelance AI Trainer Project

Remote Full-time

**Experienced Director, Data Analytics – Retail and E-commerce Insights**

Remote Full-time

[Remote] IT Help Desk Support

Remote Full-time

Experienced Technical or Corporate Recruiter with Ambition to Grow as a Client Success Expert in a Dynamic and Innovative Recruitment Agency

Remote Full-time
← Back to Home