AI QA Lead / Agentic & Multi / Agent Systems (Contract & FTE Both)-6

Remote Full-time
Remote, Illinois 60007 Posted March 29th, 2026

Looking for more job opportunities? Click here!

Job Type: Full Time

Job Category: IT

Role - AI QA Lead – Agentic & Multi-Agent Systems
Location – Remote
Contract & FTE Both

Agentic QA Engineer – Generative AI & Agentic Systems (Agent, Multi‑Agent Testing)

Summary

We are seeking a hands-on AI Engineer to design and execute end-to-end testing strategies for agentic AI solutions, including multi-agent systems in production-grade environments. This role partners with the Agentic Operations Team to ensure resiliency, reliability, accuracy, latency, orchestration correctness, and scale. You will establish QA frameworks, build reusable test artifacts, drive macro-level validations across complex workflows, and lead the QA function for Agentic AI from Dev to Prod.

Key Responsibilities:

Quality Strategy & Leadership

Agentic & Multi‑Agent Testing

Reliability, Resiliency, and Latency

Accuracy & Macro-Level Validations

Scale & Orchestration

Dev Prod Readiness

Define and own the QA strategy for agentic/multi-agent AI systems across dev, staging, and prod.

Mentor a team of QA engineers; establish testing standards, coding guidelines for test harnesses, and review practices.

Partner with Agentic Operations, Data Science, MLOps, and Platform teams to embed QA in the SDLC and incident response.

Design tests for agent orchestration, tool calling, planner-executor loops, and inter-agent coordination (e.g., task decomposition, handoff integrity, and convergence to goals).

Validate state management, context windows, memory/knowledge stores, and prompt/graph correctness under varying conditions.

Implement scenario fuzzing (e.g., adversarial inputs, prompt perturbations, tool latency spikes, degraded APIs).

Create resilience testing suites: chaos experiments, failover, retries/backoff, circuit-breaking, and degraded mode behavior.

Establish latency SLOs and measure end-to-end response times across orchestration layers (LLM calls, tool invocations, queues).

Ensure reliability through soak tests, canary verifications, and automated rollbacks.

Define ground-truth and reference pipelines for task accuracy (exact match, semantic similarity, factuality checks).

Build macro validation frameworks that validate task outcomes across multi-step agent workflows (e.g., complex data pipelines, content generation + verification agent loops).

Instrument guardrail validations (toxicity, PII, hallucination, policy compliance).

Design load/stress tests for multi-agent graphs under scale (concurrency, throughput, queue depth, backpressure).

Validate orchestrator correctness (DAG execution, retries, branching, timeouts, compensation paths).

Engineer reusable test artifacts (scenario configs, synthetic datasets, prompt libraries, agent graph fixtures, simulators).

Integrate tests into CI/CD (pre-merge gates, nightly, canary) and production monitoring with alerting tied to KPIs.

Define release criteria and run operational readiness (performance, security, compliance, cost/latency budgets).

Build post-deployment validation playbooks and incident triage runbooks.

Required Qualifications:

7+ years in Software QA/Testing, with 2+ years in AI/ML or LLM-based systems; hands-on experience testing agentic/multi-agent architectures.

Strong programming skills in Python or TypeScript/JavaScript; experience building test harnesses, simulators, and fixtures.

Experience with LLM evaluation (exact/soft match, BLEU/ROUGE, BERTScore, semantic similarity via embeddings), guardrails, and prompt testing.

Expertise in distributed systems testing latency profiling, resiliency patterns (circuit breakers, retries), chaos engineering, and message queues.

Familiarity with orchestration frameworks (LangChain, LangGraph, LlamaIndex, DSPy, OpenAI Assistants/Actions, Azure OpenAI orchestration, or similar).

Proficiency with CI/CD (GitHub Actions/Azure DevOps), observability (OpenTelemetry, Prometheus/Grafana, Datadog), and feature flags/canaries.

Solid understanding of privacy/security/compliance in AI systems (PII handling, content policies, model safety).

Excellent communication and leadership skills; proven ability to work cross-functionally with Ops, Data, and Engineering.

Preferred Qualifications:

Experience with multi-agent simulators, agent graph testing, and tooling latency emulation.

Knowledge of MLOps (model versioning, datasets, evaluation pipelines) and A/B experimentation for LLMs.

Background in cloud (AWS), serverless, containerization, and event-driven architectures.

Prior ownership of cost/latency/SLAs for AI workloads in production

Required Skills

BUSINESS CONTINUITY ANALYST

DATA GOVERNANCE

ENVIRONMENT SUPPORT ANALYST

INCIDENT MANAGEMENT

JAVA LEAD/ARCHITECT

JUNIOR CHEMICAL TESTER

TECHNICAL LEAD

TECHNICAL SUPPORT ENGINEER

UFT TESTING

Apply Now

Apply Now
Apply Now

Similar Opportunities

Experienced Registered Behavior Technician for In-Home ABA Therapy - Atlanta, GA

Remote Full-time

Immediate Hiring: Experienced Registered Behavioral Technician (RBT) for Clinic-Based ABA Therapy Services

Remote Full-time

Experienced Registered Behavioral Technician (RBT) - ABA Therapy for Children with Autism Spectrum Disorder

Remote Full-time

Experienced Registered Nurse - Telehealth: Providing Remote Care Coordination and Patient Support

Remote Full-time

Experienced Substitute Teacher for Riverside County Schools - Join Scoot Education's Innovative Team

Remote Full-time

Experienced Substitute Teacher for San Bernardino County - Flexible Schedules & Competitive Pay

Remote Full-time

Experienced School Year Instructional Coach for High-Dosage Tutoring Programs in Edgewater Park, NJ

Remote Full-time

Experienced School Year Tutor for K-8 Students in Math and Literacy - Mickleton, NJ

Remote Full-time

Experienced Secondary Social Studies Teacher for Kansas - Flexible Hybrid Remote Arrangement

Remote Full-time

USPS Office Helper

Remote Full-time

Ads Privacy Engineer (L6)

Remote Full-time

Senior Security Engineer II (Engineering & Tooling), Remote

Remote Full-time

MSSP Cybersecurity Advisor

Remote Full-time

Amazon online jobs work from home no experience (WFH)

Remote Full-time

Planning Specialist

Remote Full-time

**Experienced Entry Level Customer Service Representative – Thriving Remote Work Opportunity with arenaflex**

Remote Full-time

Technical Account Manager - US

Remote Full-time

Java Back End Technical Lead

Remote Full-time

[Hiring] Associate Director, Marketing @1000 Merck Sharp & Dohme LLC

Remote Full-time

Behavioral Health Specialist Bilingual Spanish Preferred

Remote Full-time
← Back to Home