Data Scientist (AI Quality & Evaluation)
About the Role
We're looking for a Data Scientist to own the quality, reliability, and trustworthiness of our clinical AI outputs. You'll build the systems that ensure our AI "knows what it doesn't know" ā developing evaluation frameworks, calibrated confidence scoring, and automated quality assurance that physicians can actually trust.
What You'll Do
⢠Design and implement automated evaluation pipelines that assess AI output quality, accuracy, and safety at scale
⢠Develop uncertainty quantification systems where confidence scores meaningfully correlate with accuracy
⢠Build comprehensive evaluation frameworks combining automated assessment with clinician-validated test cases
⢠Implement feedback loops that continuously improve model outputs based on validation signals
⢠Establish scalable quality gates that catch errors before they reach end users
⢠Contribute to model alignment and fine-tuning efforts
Qualifications
Required
⢠Strong foundation in deep learning frameworks (PyTorch) and LLM architectures
⢠Experience with model evaluation, benchmarking, and quality metrics
⢠Proficiency in Python and modern ML development tools
⢠Strong statistical foundations
⢠Ability to read, implement, and extend research papers
⢠Excellent communication skills
Preferred
⢠Master's degree in Computer Science, Machine Learning, Statistics, or related quantitative field (PhD preferred)
⢠Publications in top ML/AI venues (NeurIPS, ICML, ICLR, ACL)
⢠Experience with RLHF, DPO, or preference optimization techniques
⢠Background in healthcare AI or regulated industries
⢠Experience building evaluation systems for production LLM applications
Apply tot his job
Apply To this Job
We're looking for a Data Scientist to own the quality, reliability, and trustworthiness of our clinical AI outputs. You'll build the systems that ensure our AI "knows what it doesn't know" ā developing evaluation frameworks, calibrated confidence scoring, and automated quality assurance that physicians can actually trust.
What You'll Do
⢠Design and implement automated evaluation pipelines that assess AI output quality, accuracy, and safety at scale
⢠Develop uncertainty quantification systems where confidence scores meaningfully correlate with accuracy
⢠Build comprehensive evaluation frameworks combining automated assessment with clinician-validated test cases
⢠Implement feedback loops that continuously improve model outputs based on validation signals
⢠Establish scalable quality gates that catch errors before they reach end users
⢠Contribute to model alignment and fine-tuning efforts
Qualifications
Required
⢠Strong foundation in deep learning frameworks (PyTorch) and LLM architectures
⢠Experience with model evaluation, benchmarking, and quality metrics
⢠Proficiency in Python and modern ML development tools
⢠Strong statistical foundations
⢠Ability to read, implement, and extend research papers
⢠Excellent communication skills
Preferred
⢠Master's degree in Computer Science, Machine Learning, Statistics, or related quantitative field (PhD preferred)
⢠Publications in top ML/AI venues (NeurIPS, ICML, ICLR, ACL)
⢠Experience with RLHF, DPO, or preference optimization techniques
⢠Background in healthcare AI or regulated industries
⢠Experience building evaluation systems for production LLM applications
Apply tot his job
Apply To this Job