Senior AI Scientist San Francisco; Remote

Remote Full-time
Position: Senior AI Scientist New San Francisco (Remote) is an AI-powered search and productivity platform designed to empower users with personalized, efficient, and trustworthy search experiences. We combine advanced AI models with user-first principles to deliver tools that enhance discovery, creativity, and productivity. At , we are on a mission to create the most helpful search engine in the world—one that prioritizes transparency, privacy, and user control. We’re building a team of innovators, problem-solvers, and visionaries who are passionate about shaping the future of AI and technology. At , you’ll have the opportunity to work on impactful projects, collaborate with some of the brightest minds in the industry, and grow your career in an environment that values creativity, diversity, and curiosity. If you’re ready to make a difference and help us revolutionize the way people search and work, we’d love to have you join us! About the Role We re hiring a Senior AI Scientist to lead the development of novel evals methodologies and customer-facing evaluation research. You ll own the full loop: from identifying gaps in how we evaluate AI quality, to inventing new evals approaches, to deploying them in customer engagements and competitive analyses. This role sits at the center of how we understand and improve our AI systems. You ll work directly with customers to understand their unique quality requirements, design evals that capture what matters, and create reusable evaluation frameworks that scale across our customer base. You ll also contribute to our evals research agenda, publishing work on evaluation methodologies for agents, RAG systems, and search-augmented AI. The ideal candidate brings both a researcher s rigor and a practitioner s pragmatism - comfortable writing papers on evals methodology and comfortable on sales calls explaining evaluation trade-offs to enterprise customers. Responsibilities • Define and own what “good” means for search-augmented and agentic AI systems by designing evaluation frameworks that measure real-world quality, reliability, and user-relevant behavior beyond standard benchmarks. • Invent and validate novel evaluation methodologies for non-deterministic systems (LLMs, agents, RAG), including behavioral evals, long-tail and adversarial test sets, and task-specific metrics. • Develop rigorous statistical frameworks for model comparison, regression detection, and uncertainty estimation, ensuring evaluation results are defensible and decision-ready. • Build and maintain scalable evaluation systems —datasets, gold standards, eval harnesses, scoring pipelines, and analysis tooling—that can be reused across products and customers. • Lead customer-facing evaluation research , working directly with enterprise customers to translate domain-specific quality requirements into credible, actionable evals that support product decisions and sales outcomes. • Drive competitive evaluations and internal quality reviews , surfacing meaningful performance differences, trade-offs, and failure modes to inform product strategy and prioritization. • Partner with engineering and product teams to integrate evals into development loops, release gating, and ongoing quality monitoring. • Mentor and set standards for evaluation practice , reviewing eval designs, guiding other scientists, and shaping the long-term evals roadmap as systems become more agentic and complex. • End-to-End Project Leadership: Lead the development of new AI-driven projects, encompassing ideation, prototyping, research, infrastructure design, scalability, monitoring, and evaluation. • Rapid Iteration: Adapt quickly to user feedback and evolving requirements, ensuring continuous improvement in a fast-paced environment. Qualifications • Strong grounding in applied ML and statistics , with experience evaluating non-deterministic AI systems (LLMs, agents, RAG, search). • Deep experience with AI evaluation , including metric design, gold dataset creation, head-to-head comparisons, slicing, and error analysis. • Statistical rigor in model comparison , using methods such as paired tests, bootstrap confidence intervals, and robustness analyses. • Proficiency in Python for evaluation and analysis , including building eval… Apply tot his job

Apply tot his job

Apply To this Job
Apply Now

Similar Opportunities

Experienced Registered Behavior Technician for In-Home ABA Therapy - Atlanta, GA

Remote Full-time

Immediate Hiring: Experienced Registered Behavioral Technician (RBT) for Clinic-Based ABA Therapy Services

Remote Full-time

Experienced Registered Behavioral Technician (RBT) - ABA Therapy for Children with Autism Spectrum Disorder

Remote Full-time

Experienced Registered Nurse - Telehealth: Providing Remote Care Coordination and Patient Support

Remote Full-time

Experienced Substitute Teacher for Riverside County Schools - Join Scoot Education's Innovative Team

Remote Full-time

Experienced Substitute Teacher for San Bernardino County - Flexible Schedules & Competitive Pay

Remote Full-time

Experienced School Year Instructional Coach for High-Dosage Tutoring Programs in Edgewater Park, NJ

Remote Full-time

Experienced School Year Tutor for K-8 Students in Math and Literacy - Mickleton, NJ

Remote Full-time

Experienced Secondary Social Studies Teacher for Kansas - Flexible Hybrid Remote Arrangement

Remote Full-time

USPS Office Helper

Remote Full-time

Experienced Remote Online Data Entry Representatives – Flexible Work Arrangements and Professional Growth Opportunities

Remote Full-time

**Experienced Appointment Customer Representative | Work From Home Opportunity with arenaflex**

Remote Full-time

Experienced Remote Chat Moderator – Online Community Management and Safety Specialist – Flexible Work Arrangement – Earn $25-$35/hr

Remote Full-time

**Experienced Customer Service Representative – Remote Walmart Reseller Chat Support**

Remote Full-time

Work From Home Job – No Experience | Remote

Remote Full-time

Remote Outpatient Coders with VA Experience Needed!

Remote Full-time

**Experienced Data Entry Specialist – Remote Work Opportunity at blithequark**

Remote Full-time

Part Time Catering Call Center Representative - Remote Opportunity with Taziki's Café

Remote Full-time

Data Entry Clerk at Costco

Remote Full-time

Experienced Full Stack Mechanical Design Engineer – Web & Cloud Application Development for Electric Vehicle and Energy Storage Solutions

Remote Full-time
← Back to Home