Data Scientist - AI Evaluation

Remote Full-time
About Wizard
Wizard is the top-performing AI Shopping Agent, delivering the best products from across the web with unmatched accuracy, quality, and trust.
The Role
We’re looking for a Data Scientist to own how we measure, understand and improve the accuracy of our AI agent. This role sits at the intersection of data science, machine learning and product and is focused on evaluation, experimentation and insight generation. You won’t be building models but you will make sure they work in real world scenarios. You will build the systems to measure what good looks like and partner closely with ML, AI Engineering and Product to continuously improve the agent’s performance.
What You’ll Do

Define and evolve accuracy metrics across the full shopping experience (retrieval, ranking, recommendations and outcomes)
Design and run experiments to measure improvements and regressions
Build and maintain evaluation datasets, benchmarks and scoring frameworks
Translate ambiguous product questions into clear, measurable hypotheses and analysis
Partner with ML Engineers to validate model changes and guide iteration
Identify failure modes and edge cases and drive improvements through data
Create dashboards and reporting that make agent performance visible, trusted and actionable

What Success Looks like

Clear, trusted accuracy metrics are consistently used across product and engineering
A robust automated evaluation framework exists for both offline and live experiments
Model and product changes are consistently measured before and after launch

Ideal Background

4-6+ years in Data Science, ML Evaluation or Applied AI or similar roles
Deep experience evaluating AI/ML systems (ranking, recommendations, LLMs, etc)
Strong experience with experimentation (A/B testing, causal inference)
Experience working on consumer products or user facing systems and exposure to marketplace or e-commerce systems
Ability to translate messy problems into structured analysis and metrics
Strong product mindset, you care about real user outcomes
Clear communication with the ability to influence across engineering and product

Compensation & Benefits
The expected base salary range for this role is $225,000 - $280,000 USD, and will vary based on skills, experience, role level, and geographic location. Final compensation will be determined by considering these factors alongside overall role scope and responsibilities.
In addition to base salary, Wizard offers:

Equity in the form of stock options
Medical, dental, and vision coverage
401(k) plan
Flexible PTO and company holidays
Fully remote work within the United States
Periodic company offsites and team gatherings

Wizard is committed to fair, transparent, and competitive compensation practices.

Apply Now

Apply Now

Similar Opportunities

Experienced Registered Behavior Technician for In-Home ABA Therapy - Atlanta, GA

Remote Full-time

Immediate Hiring: Experienced Registered Behavioral Technician (RBT) for Clinic-Based ABA Therapy Services

Remote Full-time

Experienced Registered Behavioral Technician (RBT) - ABA Therapy for Children with Autism Spectrum Disorder

Remote Full-time

Experienced Registered Nurse - Telehealth: Providing Remote Care Coordination and Patient Support

Remote Full-time

Experienced Substitute Teacher for Riverside County Schools - Join Scoot Education's Innovative Team

Remote Full-time

Experienced Substitute Teacher for San Bernardino County - Flexible Schedules & Competitive Pay

Remote Full-time

Experienced School Year Instructional Coach for High-Dosage Tutoring Programs in Edgewater Park, NJ

Remote Full-time

Experienced School Year Tutor for K-8 Students in Math and Literacy - Mickleton, NJ

Remote Full-time

Experienced Secondary Social Studies Teacher for Kansas - Flexible Hybrid Remote Arrangement

Remote Full-time

USPS Office Helper

Remote Full-time

Afterhours Triage Registered Nurse Hospice

Remote Full-time

Entry-Level Data Entry Specialist – Part-Time Opportunity for Detail-Oriented Individuals with No Prior Experience Required at blithequark

Remote Full-time

Junior Machine Learning Engineer-remote/Entry l...

Remote Full-time

Remote Live Chat Support Representative

Remote Full-time

**Experienced Customer Service Specialist – Live Chat Support for arenaflex**

Remote Full-time

Experienced Entry-Level Remote Data Entry Specialist for Financial Services Leader – Career Growth and Development Opportunities

Remote Full-time

Technical Lead – Consumer Lending

Remote Full-time

Online Survey Researcher (Work-at-Home)

Remote Full-time

Experienced Dutch Speaking Customer Service Representative for Remote Work with Leading Global Brands

Remote Full-time

Senior Marketing Operations Manager

Remote Full-time
← Back to Home