Deep Learning Software Engineer, Inference and Model Optimization - New College Grad 2025

Remote Full-time
About the position

NVIDIA is at the forefront of the generative AI revolution! The Algorithmic Model Optimization Team specifically focuses on optimizing generative AI models such as large language models (LLM) and diffusion models for maximal inference efficiency using techniques ranging from neural architecture search and pruning to sparsity, quantization, and automated deployment strategies. Our work includes conducting applied research to improve model efficiency as well as developing an innovative software platform (TRT Model Optimizer). Our software is used both internally across NVIDIA and externally by research and engineering teams alike developing best-in-class AI models. We are now looking for a Deep Learning Software Engineer to develop and scale up our automated inference and deployment solution. As part of the team, you will be instrumental in pushing the limits of inference efficiency and large-scale, automated deployment. Your work will touch upon fundamental aspects of a typical machine learning stack including working in high-level frameworks like PyTorch and HuggingFace to developing and improving high-performance kernel implementations in CUDA, TRT-LLM, and Triton.

Responsibilities
• Train, develop, and deploy state-of-the generative AI models like LLMs and diffusion models using NVIDIA's AI software stack.
• Leverage and build upon the torch 2.0 ecosystem (TorchDynamo, torch.export, torch.compile, etc...) to analyze and extract standardized model graph representation from arbitrary torch models for our automated deployment solution.
• Develop high-performance optimization techniques for inference, such as automated model sharding techniques (e.g. tensor parallelism, sequence parallelism), efficient attention kernels with kv-caching, and more.
• Collaborate with teams across NVIDIA to use performant kernel implementations within our automated deployment solution.
• Analyze and profile GPU kernel-level performance to identify hardware and software optimization opportunities.
• Continuously innovate on the inference performance to ensure NVIDIA's inference software solutions (TRT, TRT-LLM, TRT Model Optimizer) can maintain and increase its leadership in the market.
• Play a pivotal role in architecting and designing a modular and scalable software platform to provide an excellent user experience with broad model support and optimization techniques to increase adoption.

Requirements
• Masters, PhD, or equivalent experience in Computer Science, AI, Applied Math, or related field.
• Experience in Deep Learning.
• Excellent software design skills, including debugging, performance analysis, and test design.
• Strong proficiency in Python, PyTorch, and related ML tools (e.g. HuggingFace).
• Strong algorithms and programming fundamentals.

Nice-to-haves
• Contributions to PyTorch, JAX, or other Machine Learning Frameworks.
• Knowledge of GPU architecture and compilation stack, and capability of understanding and debugging end-to-end performance.
• Familiarity with NVIDIA's deep learning SDKs such as TensorRT.
• Experience in writing high-performance GPU kernels for machine learning workloads in frameworks such as CUDA, CUTLASS, or Triton.

Benefits
• Highly competitive salaries
• Comprehensive benefits package
• Equity opportunities

Apply Now

Apply Now
Apply Now

Similar Opportunities

Experienced Registered Behavior Technician for In-Home ABA Therapy - Atlanta, GA

Remote Full-time

Immediate Hiring: Experienced Registered Behavioral Technician (RBT) for Clinic-Based ABA Therapy Services

Remote Full-time

Experienced Registered Behavioral Technician (RBT) - ABA Therapy for Children with Autism Spectrum Disorder

Remote Full-time

Experienced Registered Nurse - Telehealth: Providing Remote Care Coordination and Patient Support

Remote Full-time

Experienced Substitute Teacher for Riverside County Schools - Join Scoot Education's Innovative Team

Remote Full-time

Experienced Substitute Teacher for San Bernardino County - Flexible Schedules & Competitive Pay

Remote Full-time

Experienced School Year Instructional Coach for High-Dosage Tutoring Programs in Edgewater Park, NJ

Remote Full-time

Experienced School Year Tutor for K-8 Students in Math and Literacy - Mickleton, NJ

Remote Full-time

Experienced Secondary Social Studies Teacher for Kansas - Flexible Hybrid Remote Arrangement

Remote Full-time

USPS Office Helper

Remote Full-time

Chief Data Officer; MA or REMOTE

Remote Full-time

Release Coordinator

Remote Full-time

Legal Counsel

Remote Full-time

**Experienced Junior Data Entry Operator – Digital Database Management and Customer Service Support (Remote)**

Remote Full-time

Experienced Call Center Representative for Walgreens Data Entry - Remote Opportunity with Competitive Hourly Rate

Remote Full-time

Travel Telemetry Nurse (RN) in Burlington, NC

Remote Full-time

Senior Compensation Analyst (Hybrid Boston)

Remote Full-time

Senior Compensation Consultant (Americas) (Remote US)

Remote Full-time

[CMI] Cloud Support Engineer

Remote Full-time

Flexible Work From Home Data Entry Jobs

Remote Full-time
← Back to Home