Language Data Scientist II, AWS AI Data | Transcribe

Remote Full-time
About the position

Responsibilities
• Translate business, modeling and ethical requirements in Health AI into executable data collection projects.
• Design human-in-the-loop evaluation tasks to measure the performance and usability of models in the medical domain.
• Develop the materials necessary to execute successful data collection efforts such as guidelines, annotation interfaces, quality assurance workflows.
• Support the sourcing and/or creation of high-quality language datasets and language artifacts for feature and language expansion.
• Analyze structured and unstructured data to provide actionable recommendations to improve data quality or model performance.
• Iterate and innovate on data collection methodologies to improve data turnaround time and reliability.
• Incorporate LLMs, prompt engineering, and ML techniques to automate repetitive annotation and data creation workflows.

Requirements
• 2+ years of data scientist experience.
• 3+ years of data querying languages (e.g. SQL), scripting languages (e.g. Python) or statistical/mathematical software (e.g. R, SAS, Matlab, etc.) experience.
• PhD in a language and human behavior related field with a strong quantitative component (e.g., Cognitive Linguistics, Sociolinguistics, Human-Computer Interaction); or, a Master's degree with 3+ years of field experience.
• Experience in data mining and cleaning for NLP machine learning model pipelines.
• Experience in language data collection for quantitative analysis, including guidelines, workflow design.
• Experience in research and experimental design involving human participants.
• Experience in statistical measures for data quality assessment and research hypotheses testing.
• Practical knowledge of data labeling tools and techniques (e.g., Amazon SageMaker Ground Truth, brat, ELAN).
• Excellent knowledge of semantics, pragmatics, conversation analysis, and/or discourse analysis.
• Ability to explain complex concepts and solutions in easy-to-understand terms.

Nice-to-haves
• Experience with LLMs and prompt engineering techniques and other programmatic approaches to annotation, including weak supervision and active learning.
• Practical knowledge of version control systems (e.g. Git).
• Experience with spoken data collection, speech analysis, speech transcription (from scratch or ASR-assisted).
• Experience working with clinical or medical data, such as medical transcriptions, clinical notes, or electronic health records (EHRs).
• Knowledge of healthcare terminology and medical ontologies (e.g., SNOMED CT, ICD, RxNorm).

Benefits
• Medical, financial, and/or other benefits including equity and sign-on payments.
• Flexible working culture to support work-life balance.
• Mentorship and career growth resources.
• Employee-led affinity groups fostering a culture of inclusion.

Apply Now

Apply Now
Apply Now

Similar Opportunities

Experienced Registered Behavior Technician for In-Home ABA Therapy - Atlanta, GA

Remote Full-time

Immediate Hiring: Experienced Registered Behavioral Technician (RBT) for Clinic-Based ABA Therapy Services

Remote Full-time

Experienced Registered Behavioral Technician (RBT) - ABA Therapy for Children with Autism Spectrum Disorder

Remote Full-time

Experienced Registered Nurse - Telehealth: Providing Remote Care Coordination and Patient Support

Remote Full-time

Experienced Substitute Teacher for Riverside County Schools - Join Scoot Education's Innovative Team

Remote Full-time

Experienced Substitute Teacher for San Bernardino County - Flexible Schedules & Competitive Pay

Remote Full-time

Experienced School Year Instructional Coach for High-Dosage Tutoring Programs in Edgewater Park, NJ

Remote Full-time

Experienced School Year Tutor for K-8 Students in Math and Literacy - Mickleton, NJ

Remote Full-time

Experienced Secondary Social Studies Teacher for Kansas - Flexible Hybrid Remote Arrangement

Remote Full-time

USPS Office Helper

Remote Full-time

[Remote] Consultative Sales Representative

Remote Full-time

Customer Service Representative – arenaflex Agent Team Member (Community Insurance & Sales)

Remote Full-time

Technical Product Manager (Hong Kong) - Dragonfly Portfolio

Remote Full-time

[Remote] Legal Analyst (AI Trainer)

Remote Full-time

Technical Customer Experience Representative (Starting at $20 per hour, hybrid/mostly work at home)

Remote Full-time

Staff Full-Stack Software Engineer

Remote Full-time

**Experienced Remote Data Entry Clerk – Flexible Work Schedule for Students at blithequark**

Remote Full-time

**Experienced Proofreader & Customer Representative Specialist – Remote – (DAY OR NIGHT SHIFT) at blithequark**

Remote Full-time

Oracle Apex Developer || W2 || Onsite NY (Prefer locals NY/NJ)

Remote Full-time

Engineer - Java - Enterprise Architecture

Remote Full-time
← Back to Home