AI/LLM Evaluation & Alignment Software Engineer

Remote Full-time

At LeoTech, we are passionate about building software that solves real-world problems in the Public Safety sector. Our software has been used to help the fight against continuing criminal enterprises, drug trafficking organizations, identifying financial fraud, disrupting sex and human trafficking rings and focusing on mental health matters to name a few. Role • This is a remote, WFH role. • As an AI/LLM Evaluation & Alignment Engineer on our Data Science team, you will play a critical role in ensuring that our Large Language Model (LLM) and Agentic AI solutions are accurate, safe, and aligned with the unique requirements of public safety and law enforcement workflows. You will design and implement evaluation frameworks, guardrails, and bias-mitigation strategies that give our customers confidence in the reliability and ethical use of our AI systems. This is an individual contributor (IC) role that combines hands-on technical engineering with a focus on responsible AI deployment. You will work closely with AI engineers, product managers, and DevOps teams to establish standards for evaluation, design test harnesses for generative models, and operationalize quality assurance processes across our AI stack. Core Responsibilities • Build and maintain evaluation frameworks for LLMs and generative AI systems tailored to public safety and intelligence use cases. • Design guardrails and alignment strategies to minimize bias, toxicity, hallucinations, and other ethical risks in production workflows. • Partner with AI engineers and data scientists to define online and offline evaluation metrics (e.g., model drifts, data drifts, factual accuracy, consistency, safety, interpretability). • Implement continuous evaluation pipelines for AI models, integrated into CI/CD and production monitoring systems. • Collaborate with stakeholders to stress test models against edge cases, adversarial prompts, and sensitive data scenarios. • Research and integrate third-party evaluation frameworks and solutions; adapt them to our regulated, high-stakes environment. • Work with product and customer-facing teams to ensure explainability, transparency, and auditability of AI outputs. • Provide technical leadership in responsible AI practices, influencing standards across the organization. • Contribute to DevOps/MLOps workflows for deployment, monitoring, and scaling of AI evaluation and guardrail systems (experience with Kubernetes is a plus). • Document best practices and findings, and share knowledge across teams to foster a culture of responsible AI innovation. What We Value • Bachelor's or Master's in Computer Science, Artificial Intelligence, Data Science, or related field. • 3-5+ years of hands-on experience in ML/AI engineering, with at least 2 years working directly on LLM evaluation, QA, or safety. • Strong familiarity with evaluation techniques for generative AI: human-in-the-loop evaluation, automated metrics, adversarial testing, red-teaming. • Experience with bias detection, fairness approaches, and responsible AI design. • Knowledge of LLM observability, monitoring, and guardrail frameworks e.g Langfuse, Langsmith • Proficiency with Python and modern AI/ML/LLM/Agentic AI libraries (LangGraph, Strands Agents, Pydantic AI, LangChain, HuggingFace, PyTorch, LlamaIndex). • Experience integrating evaluations into DevOps/MLOps pipelines, preferably with Kubernetes, Terraform, ArgoCD, or GitHub Actions. • Understanding of cloud AI platforms (AWS, Azure) and deployment best practices. • Strong problem-solving skills, with the ability to design practical evaluation systems for real-world, high-stakes scenarios. • Excellent communication skills to translate technical risks and evaluation results into insights for both technical and non-technical stakeholders. Technologies We Use • Cloud & Infrastructure: AWS (Bedrock, SageMaker, Lambda), Azure AI, Kubernetes (EKS), Terraform, ArgoCD. • LLMs & Evaluation: HuggingFace, OpenAI API, Anthropic, LangChain, LlamaIndex, Ragas, DeepEval, OpenAI Evals. • Observability & Guardrails: Langfuse, GuardrailsAI. • Backend & Data: Python (primary), ElasticSearch, Kafka, Airflow. • DevOps & Automation: GitHub Actions, CodePipeline. What You Can Expect • Work from home opportunity • Enjoy great team camaraderie. • Thrive on the fast pace and challenging problems to solve. • Modern technologies and tools. • Continuous learning environment. • Opportunity to communicate and work with people of all technical levels in a team environment. • Grow as you are given feedback and incorporate it into your work. • Be part of a self-managing team that enjoys support and direction when required. • 3 weeks of paid vacation - out the gate!! • Competitive Salary. • Generous medical, dental, and vision plans. • Sick, and paid holidays are offered. $135,000 - $160,000 a year Please note the national salary range listed in the job posting reflects the new hire salary range across levels and U.S. locations that would be applicable to the position. The final salary will be commensurate with the candidate's accepted hiring level and work location. Also, this range represents base salary only and does not include equity, or benefits if applicable. LeoTech is an equal opportunity employer and does not discriminate on the basis of any legally protected status. Apply for this job Apply tot his job

Apply Now

HR Manager – Diversity and Inclusion Expert: Spearheading International Talent Acquisition and Employee Experience in Singapore

Remote Full-time

Data Architect [Must have Population Health Exp]

Remote Full-time

Entry Level Remote Data Entry and Research Participation Opportunity with Flexible Scheduling and Professional Growth at blithequark

Remote Full-time

Remote Member Health Assessor – Certified Nursing & Case Management Professional for High‑Need Care Coordination (WI Residents)

Remote Full-time

← Back to Home

AI/LLM Evaluation & Alignment Software Engineer

Similar Opportunities

Experienced Registered Behavior Technician for In-Home ABA Therapy - Atlanta, GA

Immediate Hiring: Experienced Registered Behavioral Technician (RBT) for Clinic-Based ABA Therapy Services

Experienced Registered Behavioral Technician (RBT) - ABA Therapy for Children with Autism Spectrum Disorder

Experienced Registered Nurse - Telehealth: Providing Remote Care Coordination and Patient Support

Experienced Substitute Teacher for Riverside County Schools - Join Scoot Education's Innovative Team

Experienced Substitute Teacher for San Bernardino County - Flexible Schedules & Competitive Pay

Experienced School Year Instructional Coach for High-Dosage Tutoring Programs in Edgewater Park, NJ

Experienced School Year Tutor for K-8 Students in Math and Literacy - Mickleton, NJ

Experienced Secondary Social Studies Teacher for Kansas - Flexible Hybrid Remote Arrangement

USPS Office Helper

HR Manager – Diversity and Inclusion Expert: Spearheading International Talent Acquisition and Employee Experience in Singapore

Data Architect [Must have Population Health Exp]

Entry Level Remote Data Entry and Research Participation Opportunity with Flexible Scheduling and Professional Growth at blithequark

GIS Architect

Customer Experience Representative – No Phone (Fully Remote) at blithequark

Remote Evening Emergency Radiologist - Baylor Radiologists

SAP - Senior Project Manager/Program Lead - Remote US Position.

Experienced Customer Service Representative – Live Chat Assistant for Remote Work Opportunity with arenaflex

Manager, Territory Sales- Professional channel- California- LA or San Francisco

Remote Member Health Assessor – Certified Nursing & Case Management Professional for High‑Need Care Coordination (WI Residents)

AI/LLM Evaluation & Alignment Software Engineer

Similar Opportunities

Experienced Registered Behavior Technician for In-Home ABA Therapy - Atlanta, GA

Immediate Hiring: Experienced Registered Behavioral Technician (RBT) for Clinic-Based ABA Therapy Services

Experienced Registered Behavioral Technician (RBT) - ABA Therapy for Children with Autism Spectrum Disorder

Experienced Registered Nurse - Telehealth: Providing Remote Care Coordination and Patient Support

Experienced Substitute Teacher for Riverside County Schools - Join Scoot Education's Innovative Team

Experienced Substitute Teacher for San Bernardino County - Flexible Schedules & Competitive Pay

Experienced School Year Instructional Coach for High-Dosage Tutoring Programs in Edgewater Park, NJ

Experienced School Year Tutor for K-8 Students in Math and Literacy - Mickleton, NJ

Experienced Secondary Social Studies Teacher for Kansas - Flexible Hybrid Remote Arrangement

USPS Office Helper

HR Manager – Diversity and Inclusion Expert: Spearheading International Talent Acquisition and Employee Experience in Singapore

Data Architect [Must have Population Health Exp]

Entry Level Remote Data Entry and Research Participation Opportunity with Flexible Scheduling and Professional Growth at blithequark

GIS Architect

**Customer Experience Representative – No Phone (Fully Remote) at blithequark**

Remote Evening Emergency Radiologist - Baylor Radiologists

SAP - Senior Project Manager/Program Lead - Remote US Position.

Experienced Customer Service Representative – Live Chat Assistant for Remote Work Opportunity with arenaflex

Manager, Territory Sales- Professional channel- California- LA or San Francisco

Remote Member Health Assessor – Certified Nursing & Case Management Professional for High‑Need Care Coordination (WI Residents)

Customer Experience Representative – No Phone (Fully Remote) at blithequark