Software Engineer - AI Evals and Test

Remote Full-time
About P-1 AI:

We are building an engineering AGI. We founded P-1 AI with the conviction that the greatest impact of artificial intelligence will be on the built world—helping mankind conquer nature and bend it to our will. Our first product is Archie, an AI engineer capable of quantitative and spatial reasoning over physical product domains that performs at the level of an entry-level design engineer. We aim to put an Archie on every engineering team at every industrial company on earth.

Our founding team includes the top minds in model-based engineering, deep learning, and industries that are our customers. We just closed a $23 million seed round led by Radical Ventures that includes a number of other AI and industrial luminaries. We invite you to join our team of the world’s best engineers and AI researchers, building AI’s most impactful use case.

About the Role:

In this role, you’ll be responsible for the evals that we use to ensure that Archie is learning and retaining the skills needed to successfully perform its engineering work, and benchmark it against industry skill expectations. Working within a small, tightly-knit team of high-performers, you’ll be principally responsible for clearly defining, implementing, and validating these, including input from our engineering experts and industrial partners. You’ll also be responsible for translating these eval tests into multiple formats for use with different types of AI and non-AI systems and agents.

This role is remote and you can be based anywhere in the US or Canada, where you must have existing work authorization. You will be expected to travel to our San Francisco office for co-working sessions approximately one week out of every six. If you are already located in the SF Bay Area or are interested in relocation, you are of course welcome to work out of our SF office.

Responsibilities:


Implement the system for organizing, transforming, running, grading, and reporting on eval benchmarks.


Ensure that evals run effectively within our CI/CD system, continuously benchmarking our evolving AI platform and the experiments we’re performing around it.


Work with our industrial partners, AI team, and engineering experts to gather and refine the evals.


Create methods for detecting and testing for common quality challenges of AI, including hallucinations, undesirable stochasticity, and regressions.


Be a technical leader in the consistent implementation and organization of automated tests across other areas of our technology stacks.



Skills:


Experience in constructing comprehensive test suites for software and/or AI systems.


Experience designing metrics to evaluate systems and visualize their performance, including differences across successive generations.


Experience in developing, managing, and running evals against LLM-based systems is a strong plus.


Good communication skills with a variety of stakeholders (AI researchers, domain experts, application developers).


Proficiency in Python programming, complex modules and modern software development tools and practices (Git, CI/CD, etc.).


Ability to thrive in a fast-paced, dynamic startup environment.



Interview Process:


Initial screening - with Head of Talent (30 mins)


Hiring manager interview - with co-founder & Head of Engineering (45 mins)


Programming interview - with member of technical staff & Head of Engineering (60 mins)


bring your own dev environment and tools




Culture fit / Q&A - with co-founder & CEO (45 mins)




Apply Now
Apply Now

Similar Opportunities

Experienced Registered Behavior Technician for In-Home ABA Therapy - Atlanta, GA

Remote Full-time

Immediate Hiring: Experienced Registered Behavioral Technician (RBT) for Clinic-Based ABA Therapy Services

Remote Full-time

Experienced Registered Behavioral Technician (RBT) - ABA Therapy for Children with Autism Spectrum Disorder

Remote Full-time

Experienced Registered Nurse - Telehealth: Providing Remote Care Coordination and Patient Support

Remote Full-time

Experienced Substitute Teacher for Riverside County Schools - Join Scoot Education's Innovative Team

Remote Full-time

Experienced Substitute Teacher for San Bernardino County - Flexible Schedules & Competitive Pay

Remote Full-time

Experienced School Year Instructional Coach for High-Dosage Tutoring Programs in Edgewater Park, NJ

Remote Full-time

Experienced School Year Tutor for K-8 Students in Math and Literacy - Mickleton, NJ

Remote Full-time

Experienced Secondary Social Studies Teacher for Kansas - Flexible Hybrid Remote Arrangement

Remote Full-time

USPS Office Helper

Remote Full-time

[Remote] 2026 University Graduate - Machine Learning Engineer

Remote Full-time

Urgently Hiring: Looking for Early Literacy Tutor - Global Family

Remote Full-time

Experienced Music Education Leader - Teacher Growth Manager for Innovative Online Music School

Remote Full-time

[PART_TIME Remote] Urgently Require Spanish Teacher - Louisville

Remote Full-time

Wordpress Developer - Remote, Latin America

Remote Full-time

Flexible Part-Time Data Entry & Administrative Assistant Opportunity with blithequark - Work from Home with Flexible Hours

Remote Full-time

Phlebotomy/Medical Assistant

Remote Full-time

Enterprise AI customer success manager, adoption (East)

Remote Full-time

Director Financial Planning & Analysis (mostly remote, needs to live in Miami, manufacturing/ retail experience required)

Remote Full-time

PT Multi Location Sales Associate

Remote Full-time
← Back to Home