[Hiring] Machine Learning Researcher, Audio @Bland
This description is a summary of our understanding of the job description. Click on 'Apply' button to find out more.
Role Description
As a Machine Learning Researcher at Bland, you'll be working on foundational research and development across the core components of our voice stack: speech-to-text, large language models, neural audio codecs, and text-to-speech. Your work will define how our agents understand, reason, and speak in real time at enterprise scale.
⢠Build and Scale Next-Generation TTS Systems
⢠Design and train large scale text-to-speech models capable of expressive, controllable, human-sounding output.
⢠Develop neural audio codec-based TTS architectures for efficient, high-fidelity generation.
⢠Improve prosody modeling, question inflection, emotional expression, and multi-speaker robustness.
⢠Optimize for real-time, low-latency inference in production.
⢠Advance Speech-to-Text Modeling
⢠Build and fine-tune large scale ASR systems robust to accents, noise, telephony artifacts, and code switching.
⢠Leverage self-supervised pretraining and large-scale weak supervision.
⢠Improve transcription accuracy for real-world enterprise scenarios, including structured extraction and conversational nuance.
⢠Pioneer Neural Audio Codecs
⢠Research and implement neural audio codecs that achieve extreme compression with minimal perceptual loss.
⢠Explore discrete and continuous latent representations for scalable speech modeling.
⢠Design codec architectures that enable downstream generative modeling and controllable synthesis.
⢠Develop Scalable Training Pipelines
⢠Curate and process massive audio datasets across languages, speakers, and environments.
⢠Design staged training curricula and data filtering strategies.
⢠Scale training across distributed GPU clusters focusing on cost, throughput, and reliability.
⢠Run Rigorous Experiments
⢠Design ablation studies that isolate the impact of architectural changes.
⢠Measure improvements using both objective metrics and perceptual evaluations.
⢠Validate ideas quickly through focused experiments that confirm or eliminate hypotheses.
Qualifications
⢠Experience with self-supervised learning, multimodal modeling, or generative modeling.
⢠Hands-on experience building or scaling TTS, STT, or neural audio codec systems.
⢠Familiarity with large scale speech datasets and real-world audio variability.
⢠Experience training and serving large models on modern accelerators.
⢠Track record of designing controlled experiments and meaningful ablations.
⢠Comfortable in fast-moving startup environments.
Requirements
⢠Ability to derive new formulations and implement them efficiently.
⢠Strong intuition for audio quality, prosody, and conversational dynamics.
⢠Knowledge of inference optimization techniques, including quantization, kernel optimization, and memory efficiency.
⢠Understanding of real-time constraints in telephony or streaming environments.
⢠Ability to move quickly from hypothesis to validation.
⢠Strong ownership mindset from research through deployment.
⢠Excited by ambiguous, unsolved problems.
Benefits
⢠Healthcare, dental, vision, all the good stuff
⢠Meaningful equity in a fast-growing company
⢠Every tool you need to succeed
⢠Beautiful office in Jackson Square, SF with rooftop views
⢠Competitive salary: $160,000 to $250,000
Apply tot his job
Apply To this Job
Role Description
As a Machine Learning Researcher at Bland, you'll be working on foundational research and development across the core components of our voice stack: speech-to-text, large language models, neural audio codecs, and text-to-speech. Your work will define how our agents understand, reason, and speak in real time at enterprise scale.
⢠Build and Scale Next-Generation TTS Systems
⢠Design and train large scale text-to-speech models capable of expressive, controllable, human-sounding output.
⢠Develop neural audio codec-based TTS architectures for efficient, high-fidelity generation.
⢠Improve prosody modeling, question inflection, emotional expression, and multi-speaker robustness.
⢠Optimize for real-time, low-latency inference in production.
⢠Advance Speech-to-Text Modeling
⢠Build and fine-tune large scale ASR systems robust to accents, noise, telephony artifacts, and code switching.
⢠Leverage self-supervised pretraining and large-scale weak supervision.
⢠Improve transcription accuracy for real-world enterprise scenarios, including structured extraction and conversational nuance.
⢠Pioneer Neural Audio Codecs
⢠Research and implement neural audio codecs that achieve extreme compression with minimal perceptual loss.
⢠Explore discrete and continuous latent representations for scalable speech modeling.
⢠Design codec architectures that enable downstream generative modeling and controllable synthesis.
⢠Develop Scalable Training Pipelines
⢠Curate and process massive audio datasets across languages, speakers, and environments.
⢠Design staged training curricula and data filtering strategies.
⢠Scale training across distributed GPU clusters focusing on cost, throughput, and reliability.
⢠Run Rigorous Experiments
⢠Design ablation studies that isolate the impact of architectural changes.
⢠Measure improvements using both objective metrics and perceptual evaluations.
⢠Validate ideas quickly through focused experiments that confirm or eliminate hypotheses.
Qualifications
⢠Experience with self-supervised learning, multimodal modeling, or generative modeling.
⢠Hands-on experience building or scaling TTS, STT, or neural audio codec systems.
⢠Familiarity with large scale speech datasets and real-world audio variability.
⢠Experience training and serving large models on modern accelerators.
⢠Track record of designing controlled experiments and meaningful ablations.
⢠Comfortable in fast-moving startup environments.
Requirements
⢠Ability to derive new formulations and implement them efficiently.
⢠Strong intuition for audio quality, prosody, and conversational dynamics.
⢠Knowledge of inference optimization techniques, including quantization, kernel optimization, and memory efficiency.
⢠Understanding of real-time constraints in telephony or streaming environments.
⢠Ability to move quickly from hypothesis to validation.
⢠Strong ownership mindset from research through deployment.
⢠Excited by ambiguous, unsolved problems.
Benefits
⢠Healthcare, dental, vision, all the good stuff
⢠Meaningful equity in a fast-growing company
⢠Every tool you need to succeed
⢠Beautiful office in Jackson Square, SF with rooftop views
⢠Competitive salary: $160,000 to $250,000
Apply tot his job
Apply To this Job