Member of Engineering – Pre-training, Data Engineering

Remote Full-time
Job Description: • Build and maintain high-performance pipelines for trillions of tokens. • Deliver diverse and high quality datasets for pre-training foundation models. • Closely work with other teams such as Pretraining, Posttraining, Evals and Product to to ensure alignment on the quality of the models delivered. Requirements: • Strong background in building production-grade, distributed data systems for machine learning, with experience in: • Orchestration: Slurm, Airflow, or Dagster • Observability & Reliability: CI/CD, Grafana, Prometheus, etc. • Infra: Git, Docker, k8s, cloud managed services • Batched inference (ex: vLLM) • Performance obsession, especially with large-scale GPU clusters and distributed pipelines • Expert-level python knowledge and ability to write clean and maintainable code • Strong algorithmic foundations • Proficiency with libraries like Polars, Dask, or PySpark • Nice to have: • Experience in building trillion-scale SOTA pretraining datasets • Experience translating research to production at scale • Experience with OCR, web crawling, or evals • Prior experience pre-training LLMs Benefits: • Fully remote work & flexible hours • 37 days/year of vacation & holidays • Health insurance allowance for you and dependents • Company-provided equipment • Wellbeing, always-be-learning and home office allowances • Frequent team get togethers • Great diverse & inclusive people-first culture Apply tot his job
Apply Now

Similar Opportunities

Experienced Registered Behavior Technician for In-Home ABA Therapy - Atlanta, GA

Remote Full-time

Immediate Hiring: Experienced Registered Behavioral Technician (RBT) for Clinic-Based ABA Therapy Services

Remote Full-time

Experienced Registered Behavioral Technician (RBT) - ABA Therapy for Children with Autism Spectrum Disorder

Remote Full-time

Experienced Registered Nurse - Telehealth: Providing Remote Care Coordination and Patient Support

Remote Full-time

Experienced Substitute Teacher for Riverside County Schools - Join Scoot Education's Innovative Team

Remote Full-time

Experienced Substitute Teacher for San Bernardino County - Flexible Schedules & Competitive Pay

Remote Full-time

Experienced School Year Instructional Coach for High-Dosage Tutoring Programs in Edgewater Park, NJ

Remote Full-time

Experienced School Year Tutor for K-8 Students in Math and Literacy - Mickleton, NJ

Remote Full-time

Experienced Secondary Social Studies Teacher for Kansas - Flexible Hybrid Remote Arrangement

Remote Full-time

USPS Office Helper

Remote Full-time

[Remote] REMOTE - Technical Data Governance and Access Control Analyst

Remote Full-time

**Experienced Remote Customer Support Specialists – Deliver Exceptional Customer Experiences and Thrive in a Dynamic arenaflex Team**

Remote Full-time

Specialist Cricket Indirect Sales

Remote Full-time

Experienced and Passionate Substitute Teacher Wanted for Immediate Hire in Glenwood, NJ - Flexible Scheduling and Competitive Pay Rates Offered

Remote Full-time

IT – System Engineer für Kommunikations- und Applikationssysteme (m/w/d) - Wien

Remote Full-time

**Experienced Bilingual Spanish Call Center Customer Service Representative – Thrive in a Dynamic Remote Environment at arenaflex**

Remote Full-time

Experienced Spanish Speaking Customer Support Specialist - Work from Home with blithequark

Remote Full-time

[Remote] Associate Customer Success Manager

Remote Full-time

Dedicated and Compassionate Teachers Assistant Wanted for Immediate Hire in Fayetteville, NC - Join Our Team of Educators and Make a Difference in the Lives of Children

Remote Full-time

**Experienced Full Stack Customer Service Representative – Remote Work Opportunity at arenaflex**

Remote Full-time
← Back to Home