Lead Software Engineer - AI Operations and Tooling

Remote Full-time
Job Posting Title: Lead Software Engineer - AI Operations and Tooling Req ID: 10137409 Job Description: Disney Entertainment and ESPN Product & Technology Technology is at the heart of Disney’s past, present, and future. Disney Entertainment and ESPN Product & Technology is a global organization of engineers, product developers, designers, technologists, data scientists, and more – all working to build and advance the technological backbone for Disney’s media business globally. The team marries technology with creativity to build world-class products, enhance storytelling, and drive velocity, innovation, and scalability for our businesses. We are Storytellers and Innovators. Creators and Builders. Entertainers and Engineers. We work with every part of The Walt Disney Company’s media portfolio to advance the technological foundation and consumer media touch points serving millions of people around the world. Here are a few reasons why we think you’d love working here: Building the future of Disney’s media: Our Technologists are designing and building the products and platforms that will power our media, advertising, and distribution businesses for years to come. Reach, Scale & Impact: More than ever, Disney’s technology and products serve as a signature doorway for fans' connections with the company’s brands and stories. Disney+. Hulu. ESPN. ABC. ABC News…and many more. These products and brands – and the unmatched stories, storytellers, and events they carry – matter to millions of people globally. Innovation: We develop and implement groundbreaking products and techniques that shape industry norms, and solve complex and distinctive technical problems. Ad Platforms is responsible for Disney’s industry-leading ad technology and products – driving advertising performance, innovation, and value in Disney’s sports, news, and entertainment content, across all media platforms. Job Summary: We are hiring a Lead Engineer to establish and guide our AI Operations and Tooling practice, enabling the safe, reliable, and cost-efficient operation of AI applications across AWS, Azure, and GCP. This role is focused on enabling AI-specific operations, such as hallucination testing, A/B evaluation, guardrail enforcement, and cost optimization, by leveraging, extending, and building around existing tools and platforms to accelerate operational stability and performance. As a hands-on technical lead, you will mentor engineers, design operational enablement frameworks, and partner closely with AI engineering, product teams. The goal is not to own every tool, but to make AI systems more observable, testable, and resilient by enabling the right capabilities and automation around them. This role will deliver measurable business outcomes by preventing runaway spend, improving reliability, and driving efficiency in AI/cloud usage. Responsibilities and Duties of the Role: Operational Architecture & Enablement Define frameworks for AI-specific operations: hallucination/quality testing, evaluation pipelines, and continuous validation. Establish reference patterns for scaling LLM services, prompt orchestration, and multi-agent workloads. Build automation for safe rollout, monitoring, and incident response. Observability, Reliability & Cost Management Implement end-to-end observability: latency, drift, failure modes, hallucination rates, and GPU/compute utilization. Drive cost optimization and efficiency across AI cloud usage (AWS, Azure, GCP). Define SLOs, dashboards, and runbooks for AI/LLM production systems. Governance, Guardrails & Security Embed compliance, safety checks, and prompt-injection defenses into operational frameworks. Partner with security and governance teams to enforce enterprise-grade auditability and policy enforcement. Leadership & Cross-Team Collaboration Mentor engineers in DevOps, infra, and AI operations. Drive adoption of best practices for AI reliability, test automation, and incident management. Collaborate across AI Core, Data Foundations, Security, and Product teams to ensure operational safety and scale. Basic Qualifications Bachelor’s degree in Computer Science, Engineering, or related technical field (Master’s preferred), or equivalent experience. 7+ years of experience in software engineering, DevOps, or infrastructure, with at least 2 years in a lead role. Expert in at least one foundational language (Python, Java, or Go) with production-grade system experience. Hands-on experience with cloud-native infrastructure (AWS preferred; Azure/GCP a plus) and modern orchestration platforms Proven experience with observability stacks (Datadog, Prometheus, Grafana) and incident response automation. Familiarity with AI/LLM APIs (OpenAI, Anthropic, Bedrock, Azure AI Foundry) and orchestration frameworks (LangChain, LangGraph). Strong knowledge of operational AI testing (A/B evaluation, regression, red-teaming) and guardrail enforcement. Demonstrated ability to optimize cloud/GPU usage and manage costs at scale. Excellent communication skills and proven ability to lead design reviews, mentor engineers, and influence cross-functional teams. Preferred Qualifications Experience with AI-focused evaluation frameworks (LangSmith, PromptLayer, etc.). Prior work in AI operations, SRE, or ML platform DevOps roles. Knowledge of multi-agent orchestration patterns and operational reliability for AI systems. Strong background in test automation and continuous validation for distributed systems. Skilled at incident review (RCA) and driving operational excellence across large-scale environments. #disneytech The hiring range for this position in Los Angeles, CA is between $141,900 – $190,300, San Francisco, CA is between $155,400 - $208,400 and Seattle, WA $148,700 - $199,400. The base pay actually offered will take into account internal equity and also may vary depending on the candidate’s geographic region, job-related knowledge, skills, and experience among other factors. A bonus and/or long-term incentive units may be provided as part of the compensation package, in addition to the full range of medical, financial, and/or other benefits, dependent on the level and position offered. Job Posting Segment: Ad Platforms Job Posting Primary Business: AP - Software Engineering Primary Job Posting Category: Software Engineer Employment Type: Full time Primary City, State, Region, Postal Code: Glendale, CA, USA Alternate City, State, Region, Postal Code: USA - CA - Market St Date Posted: 2025-12-15 Learn more about us. Apply tot his job
Apply Now

Similar Opportunities

Experienced Registered Behavior Technician for In-Home ABA Therapy - Atlanta, GA

Remote Full-time

Immediate Hiring: Experienced Registered Behavioral Technician (RBT) for Clinic-Based ABA Therapy Services

Remote Full-time

Experienced Registered Behavioral Technician (RBT) - ABA Therapy for Children with Autism Spectrum Disorder

Remote Full-time

Experienced Registered Nurse - Telehealth: Providing Remote Care Coordination and Patient Support

Remote Full-time

Experienced Substitute Teacher for Riverside County Schools - Join Scoot Education's Innovative Team

Remote Full-time

Experienced Substitute Teacher for San Bernardino County - Flexible Schedules & Competitive Pay

Remote Full-time

Experienced School Year Instructional Coach for High-Dosage Tutoring Programs in Edgewater Park, NJ

Remote Full-time

Experienced School Year Tutor for K-8 Students in Math and Literacy - Mickleton, NJ

Remote Full-time

Experienced Secondary Social Studies Teacher for Kansas - Flexible Hybrid Remote Arrangement

Remote Full-time

USPS Office Helper

Remote Full-time

**Experienced Customer Support Associate – Remote Opportunity at blithequark**

Remote Full-time

FEMA Public Assistance Specialist – Remote / Hybrid – (Los Angeles, CA)

Remote Full-time

Master Class - Business Consultant - Life Sciences Content

Remote Full-time

**Experienced Part-Time Remote Customer Service Representative – Join blithequark's Customer Support Team**

Remote Full-time

[Remote] Executive Director, Global Value, Access and Pricing, NSCLC

Remote Full-time

**Experienced Customer Support Representative – Work From Home Opportunity at arenaflex**

Remote Full-time

Remote Data Entry Specialist for Global Retail Leader - Entry-Level Opportunity with arenaflex - Work from Home

Remote Full-time

Experienced Retail Customer Service Representative – Delivering Exceptional Shopping Experiences and Driving Sales Growth at blithequark

Remote Full-time

[Remote] Data Engineer - Databricks - 1 YEAR REMOTE CONTACT

Remote Full-time

To Go Specialist

Remote Full-time
← Back to Home