Senior Machine Learning Researcher

Remote Full-time
Company Overview:

We are building Protege to solve the biggest unmet need in AI — getting access to the right training data. The process today is time intensive, incredibly expensive, and often ends in failure. The Protege platform facilitates the secure, efficient, and privacy-centric exchange of AI training data.

Solving AI’s data problem is a generational opportunity. We’re backed by world-class investors and already powering partnerships with some of the most ambitious teams in AI. The company that succeeds will be one of the largest in AI — and in tech.

We’re a lean, fast-moving, high-trust team of builders who are obsessed with velocity and impact. Our culture is built for people who thrive on ambiguity, own outcomes, and want to shape the future of data and AI.

Role Overview:

Data is the foundation of AI performance, and we believe model quality starts with data quality. You’ll be at the heart of shaping how we curate, assess, and prepare the training data that powers real-world AI systems.

We’re seeking a Senior Member of the Core Data Team/ Principal Scientist to lead the evaluation and optimization of large-scale datasets used to train state-of-the-art AI models. In this role, you’ll help define what "high-quality data" means in practice, using statistical, computational, and ML-driven methods to ensure our data is diverse, representative, and high-impact. You’ll work closely with research and engineering teams to improve model performance through better data. This is an ideal role for someone with a PhD in machine learning, CS, or a related applied field who is passionate about the role of data in AI training and excited to advance Protege’s mission to become the ubiquitous platform for AI training data.

Key Responsibilities:
• Design and apply statistical and machine learning methods to curate, filter, and enrich large-scale unstructured datasets
• Develop frameworks to assess data diversity, duplication, and informativeness. Design statistical approaches to de-risk training datasets
• Collaborate with model training teams to identify data bottlenecks and optimize dataset performance. Emphasis on ability to collaborate with large foundational models and smaller startups
• Provide leadership on data quality strategy and shape internal best practices
• Evaluate external datasets for integration, focusing on scalability, quality, and relevance to model performance. Help build data scorecards
• Contribute to research and development of tools that automate data preprocessing and validation

About You:
• PhD or equivalent Master's Degree + 4+ years industry experience in machine learning, economics, mathematics, engineering, computer science, statistics, or a related quantitative field
• Strong understanding of AI model training pipelines, including pre-processing and evaluation
• Experience working with large, unstructured datasets, especially text
• Background in statistical analysis, bias detection, and data validation
• Able to identify high-impact problems and drive independent solutions

Bonus if you have these attributes:
• Experience with synthetic data generation or augmentation strategies
• Publications or open-source contributions in data-centric AI or related areas
• Experience developing evaluation frameworks or performance metrics for training data
• Cross-functional collaboration with product, infrastructure, or partnership teams

Apply tot his job

Apply To this Job
Apply Now

Similar Opportunities

Experienced Registered Behavior Technician for In-Home ABA Therapy - Atlanta, GA

Remote Full-time

Immediate Hiring: Experienced Registered Behavioral Technician (RBT) for Clinic-Based ABA Therapy Services

Remote Full-time

Experienced Registered Behavioral Technician (RBT) - ABA Therapy for Children with Autism Spectrum Disorder

Remote Full-time

Experienced Registered Nurse - Telehealth: Providing Remote Care Coordination and Patient Support

Remote Full-time

Experienced Substitute Teacher for Riverside County Schools - Join Scoot Education's Innovative Team

Remote Full-time

Experienced Substitute Teacher for San Bernardino County - Flexible Schedules & Competitive Pay

Remote Full-time

Experienced School Year Instructional Coach for High-Dosage Tutoring Programs in Edgewater Park, NJ

Remote Full-time

Experienced School Year Tutor for K-8 Students in Math and Literacy - Mickleton, NJ

Remote Full-time

Experienced Secondary Social Studies Teacher for Kansas - Flexible Hybrid Remote Arrangement

Remote Full-time

USPS Office Helper

Remote Full-time

**Experienced Full Stack Customer Experience Manager – Digital Character Client Experience**

Remote Full-time

MANAGER, FIELD MARKETING (REMOTE OPPORTUNITY)

Remote Full-time

Exciting FULL TIME Netflix Remote$72000/year - ...

Remote Full-time

Experienced Customer Support Specialist – Remote Live Chat Agent Opportunity for Career Growth and Development at blithequark

Remote Full-time

Blockchain Developer

Remote Full-time

**Experienced Data Entry Operator – Corporate Database Management and Information Services**

Remote Full-time

Calendar Clerk

Remote Full-time

**Experienced Entry-Level Data Entry Clerk – Remote Opportunity at arenaflex**

Remote Full-time

Experienced Data Entry Pharmacy Technician – Long Term Care and Pharmacy Operations Specialist at blithequark

Remote Full-time

Experienced Remote Data Entry Clerk for Accurate Typing and Information Management – Start Your Career Today with arenaflex

Remote Full-time
← Back to Home