Principal Architect, Site Reliability Engineering - GeForce Now

Remote Full-time
NVIDIA is the world leader in accelerated computing—from gaming to data centers to AI and robotics. We are a team of trailblazers reinventing computing at the intersection of graphics, high-performance computing, and AI. If you’re driven to tackle sophisticated challenges, push boundaries, and build technology that powers the future, NVIDIA is the place for you.

We are looking for an expert and transformative Principal Architect forSite Reliability Engineering (SRE) to join our GeForce Now Engineering team. In this role, you will define the architecture and strategic direction for NVIDIA’s highly available , scalable, and secure systems that power critically important services and platforms. You’ll collaborate with product , platform , and infrastructure teams to establish best practices, improve reliability, and drive the evolution of our SRE function. This is a highly specialized subject area which demands knowledge across different systems, networking, coding, database, capacity management, continuous delivery and deployment and open source cloud enabling technologies such as Kubernetes.

What You Will Be Doing
• Design and architect scalable, resilient infrastructure for cloud-native and hybrid services.
• Define and implement SRE principles, SLAs, SLOs, and error budgets across teams and services.
• Collaborate with multi-functional teams to ensure reliability, observability, performance, and security.
• Lead architecture reviews, disaster recovery planning, incident response strategies, and postmortems.
• Develop automation frameworks for deployment, monitoring, and remediation of systems.
• Champion a culture of reliability, continuous improvement, and operational excellence.
• Mentor SREs and DevOps engineers, sharing knowledge and standard methodologies across the organization.

What We Need To See
• Bachelor’s or Master’s degree in Computer Science, Engineering, or a related field (or equivalent experience).
• 15+ years of experience in infrastructure, cloud, or SRE roles, including at least 5+ years in an architectural or technical leadership position.
• Expertise in cloud platforms (e.g., AWS, Azure, GCP) and container orchestration (Kubernetes).
• Deep understanding of distributed systems, microservices architecture, and CI/CD pipelines.
• Proficient with observability tools (Prometheus, Grafana, ELK/EFK, Datadog) and infrastructure as code (Terraform, Ansible).
• Strong programming/scripting skills (Python, Go, Bash, etc.).
• Ability to communicate your ideas/code clearly through documents, presentation, etc.

Ways To Stand Out From The Crowd
• AWS, GCP, or Azure Professional Solution Architect Certification.
• Familiarity with parallel programming and distributed computing platforms
• Experience in developing large-scale and complex applications.
• Cross-platform development experience.

NVIDIA is widely considered to be one of the technology world’s most desirable employers. We have some of the most forward-thinking and dedicated people in the world working for us. If you're creative and autonomous, we want to hear from you!

Your base salary will be determined based on your location, experience, and the pay of employees in similar positions. The base salary range is 248,000 USD - 391,000 USD.

You will also be eligible for equity and benefits .

Applications for this job will be accepted at least until August 2, 2025.NVIDIA is committed to fostering a diverse work environment and proud to be an equal opportunity employer. As we highly value diversity in our current and future employees, we do not discriminate (including in our hiring and promotion practices) on the basis of race, religion, color, national origin, gender, gender expression, sexual orientation, age, marital status, veteran status, disability status or any other characteristic protected by law.

JR1999812

Apply Now
Apply Now

Similar Opportunities

Experienced Registered Behavior Technician for In-Home ABA Therapy - Atlanta, GA

Remote Full-time

Immediate Hiring: Experienced Registered Behavioral Technician (RBT) for Clinic-Based ABA Therapy Services

Remote Full-time

Experienced Registered Behavioral Technician (RBT) - ABA Therapy for Children with Autism Spectrum Disorder

Remote Full-time

Experienced Registered Nurse - Telehealth: Providing Remote Care Coordination and Patient Support

Remote Full-time

Experienced Substitute Teacher for Riverside County Schools - Join Scoot Education's Innovative Team

Remote Full-time

Experienced Substitute Teacher for San Bernardino County - Flexible Schedules & Competitive Pay

Remote Full-time

Experienced School Year Instructional Coach for High-Dosage Tutoring Programs in Edgewater Park, NJ

Remote Full-time

Experienced School Year Tutor for K-8 Students in Math and Literacy - Mickleton, NJ

Remote Full-time

Experienced Secondary Social Studies Teacher for Kansas - Flexible Hybrid Remote Arrangement

Remote Full-time

USPS Office Helper

Remote Full-time

Home RN Nursing Positions

Remote Full-time

Dynamic Customer Service Representative – Client Relations & Technical Support Specialist for Pest & Weed Management Solutions

Remote Full-time

Financial Reporting Intern

Remote Full-time

Software Engineer, User Sharing

Remote Full-time

Veterinary Technician II - Lawrenceville, GA

Remote Full-time

Credentialing Specialist

Remote Full-time

Entry-Level Data Entry Clerk – Remote Work-from-Home Opportunity for Detail-Oriented Individuals with Strong Computer Skills and a Passion for Accuracy

Remote Full-time

PostgreSQL Database Engineer

Remote Full-time

Physician Scheduler-Locums Scheduler-Clinical Scheduler

Remote Full-time

Dispatcher I (3rd Shift) -- Remote | WFH

Remote Full-time
← Back to Home