AI Systems Engineer (LLM Performance, Cost & Reliability) | Audit → Recommend → Implement

Remote Full-time
Overview

Jules is a mobile AI-powered style and dating photo coach. We analyze outfit photos and dating profile images, score them, and give actionable feedback using LLMs and vision models.

The product is live, architected, and thoughtfully built.

What we need now is systems-level optimization.

We’re looking for a senior engineer to audit, optimize, and harden our LLM infrastructure — reducing latency and cost while improving reliability and consistency — without changing product flows or UX.

This is not a greenfield build.

This is not prompt polishing.

This is a real production system that needs to scale.

What You’ll Do

Phase 1

Audit all LLM usage across the system:

FitCheck (vision)

PicReview (vision)

Comparison modes

Conversational chat

Analyze:

Latency bottlenecks (user-perceived and backend)

Cost per request / feature / user

Model usage vs actual requirements

Prompt size, retries, determinism, and waste

Review existing cost instrumentation and update pricing assumptions

Deliverable:

A written audit outlining:

Current performance & cost profile

Clear problem areas

Ranked list of optimization opportunities with estimated impact

Phase 2 — Optimize & Implement

Implement agreed optimizations directly in the codebase, which may include:

Multi-model routing (cheap → expensive fallback)

Vision + text model rationalization

Caching (hash-based, context-based, or result reuse)

Async coordination improvements (queues, batching, retries)

Prompt minimization and structural refactors (not stylistic rewrites)

More accurate cost tracking and reporting

Ensure output stability and scoring consistency are preserved

Deliverable:

Merged code changes

Before/after latency and cost comparison

Clear documentation of decisions and tradeoffs

What You Will Not Do

To be explicit:

❌ Redesign product flows, UX, or scoring logic

❌ Rewrite Jules’ persona or tone

❌ “Improve” the product by adding features

❌ Push unnecessary infra churn before instrumentation

❌ Suggest fine-tuning as a first solution

Your job is to make the engine faster, cheaper, and more reliable, not change the car.

Technical Environment (You’ll Be Working Inside This)

Frontend: React Native (Expo, TypeScript)

Backend: Node.js + Express

Database: MongoDB

AI: OpenAI (GPT-4o for vision, GPT-4.1-mini for chat)

Infra: Cloudinary (images), Firebase Auth, Segment, Sentry

Architecture: Async API calls, structured JSON outputs, prompt routing system

Full architecture documentation will be provided on engagement start.

What We’re Looking For

Required

Deep experience optimizing production LLM systems

Strong intuition for cost vs latency vs quality tradeoffs

Hands-on backend engineering skills (Node.js)

Experience with:

model routing

async systems

caching strategies

deterministic LLM outputs

Nice to Have

Vision model experience

Experience evaluating multiple inference providers

Prior startup or zero-to-scale experience

Engagement Details

Type: Short-term contract

Length: TBD

Scope: Audit → Recommend → Implement

Potential extension: Yes, based on results

Timezone: Flexible, but on Pacific Time

Apply tot his job

Apply To this Job
Apply Now

Similar Opportunities

Experienced Registered Behavior Technician for In-Home ABA Therapy - Atlanta, GA

Remote Full-time

Immediate Hiring: Experienced Registered Behavioral Technician (RBT) for Clinic-Based ABA Therapy Services

Remote Full-time

Experienced Registered Behavioral Technician (RBT) - ABA Therapy for Children with Autism Spectrum Disorder

Remote Full-time

Experienced Registered Nurse - Telehealth: Providing Remote Care Coordination and Patient Support

Remote Full-time

Experienced Substitute Teacher for Riverside County Schools - Join Scoot Education's Innovative Team

Remote Full-time

Experienced Substitute Teacher for San Bernardino County - Flexible Schedules & Competitive Pay

Remote Full-time

Experienced School Year Instructional Coach for High-Dosage Tutoring Programs in Edgewater Park, NJ

Remote Full-time

Experienced School Year Tutor for K-8 Students in Math and Literacy - Mickleton, NJ

Remote Full-time

Experienced Secondary Social Studies Teacher for Kansas - Flexible Hybrid Remote Arrangement

Remote Full-time

USPS Office Helper

Remote Full-time

Managing Consultant, Life Sciences Technology Consulting - Patient Services

Remote Full-time

**Experienced Full Stack Data Entry Specialist – Remote Opportunity at arenaflex**

Remote Full-time

Entry Level Fully Remote Customer Support Representative - No Experience Needed, Competitive Salary, Flexible Schedule

Remote Full-time

System Administrator - Storage and Virtualization of Non-Core Services

Remote Full-time

**Experienced Full Stack Customer Support Representative – Remote Email/Chat Support**

Remote Full-time

Freelance Reporter - Hedge Fund Law Report/Private Equity Law Report

Remote Full-time

[Remote] Compliance Sr. Analyst- Supply Chain Master Data Compliance

Remote Full-time

Virtual Assistant (Remote) - Earn 18 to 21 Hourly - No Degree Required

Remote Full-time

Oracle Planning Cloud Architect-7

Remote Full-time

Remote Work From Home Data Entry Clerk / Typing

Remote Full-time
← Back to Home