AI Evaluation Specialist

Remote Full-time
Binance is a leading global blockchain ecosystem behind the world’s largest cryptocurrency exchange by trading volume and registered users. We are trusted by over 280 million people in 100+ countries for our industry-leading security, user fund transparency, trading engine speed, deep liquidity, and an unmatched portfolio of digital-asset products. Binance offerings range from trading and finance to education, research, payments, institutional services, Web3 features, and more. We leverage the power of digital assets and blockchain to build an inclusive financial ecosystem to advance the freedom of money and improve financial access for people around the world.We are seeking a dedicated AI Evaluation Specialist responsible for designing, implementing, and managing comprehensive evaluation frameworks that span the entire lifecycle of LLM agents—from pre-deployment testing to post-deployment monitoring and iterative refinement. Your work will directly influence Binance’s AI adoption journey by ensuring the reliability, adaptability, and governance compliance of AI agents operating across various domains such as Customer Service, Growth, and Compliance. Responsibilities: Participate in the entire software development lifecycle, encompassing all stages from requirements analysis to test planning, execution, defect tracking, through to product release and maintenance.Go to person in relation to A.I Agents evaluation and continuously monitoring.Create comprehensive and effective test strategies and hands-on testing to ensure the accuracy, reliability, and performance of AI and data applications .Root cause analysis of test failures and product issues in an effective manner, and drive optimization for future enhancements.Design and develop internal tools leveraging AI technology to improve engineering and testing work efficiency. Requirements: Bachelor’s or Master’s degree in Computer Science, Artificial Intelligence, Data Science, or a related field.Strong understanding of Large Language Models (LLMs), autonomous AI agents, and their system architectures.Experience with AI evaluation methodologies, including offline benchmarking, online monitoring, and hybrid human-AI evaluation approaches.Familiarity with software engineering best practices such as Test-Driven Development (TDD), Behavior-Driven Development (BDD), and their limitations in AI contexts.Proficiency in designing adaptive, lifecycle-spanning evaluation frameworks that incorporate both quantitative and qualitative metrics.Experience with evaluation tools and frameworks (e.g., Opik,LangSmith) is a plus.Ability to analyze complex system-level behaviors, including reasoning pipelines, tool integrations, and emergent agent actions.Strong analytical skills with experience in data-driven diagnostics and root cause analysis.Excellent communication skills to document evaluation plans, results, and recommendations clearly.Experience working in cross-functional teams and managing feedback loops between evaluation and development.Experience collaborating with infrastructure or platform teams to improve AI tooling and automation platforms. Additional Information Why Binance• Shape the future with the world’s leading blockchain ecosystem• Collaborate with world-class talent in a user-centric global organization with a flat structure• Tackle unique, fast-paced projects with autonomy in an innovative environment• Thrive in a results-driven workplace with opportunities for career growth and continuous learning• Competitive salary and company benefits• Work-from-home arrangement (the arrangement may vary depending on the work nature of the business team)Binance is committed to being an equal opportunity employer. We believe that having a diverse workforce is fundamental to our success.By submitting a job application, you confirm that you have read and agree to our Candidate Privacy Notice.

Apply Now
Apply Now

Similar Opportunities

Experienced Registered Behavior Technician for In-Home ABA Therapy - Atlanta, GA

Remote Full-time

Immediate Hiring: Experienced Registered Behavioral Technician (RBT) for Clinic-Based ABA Therapy Services

Remote Full-time

Experienced Registered Behavioral Technician (RBT) - ABA Therapy for Children with Autism Spectrum Disorder

Remote Full-time

Experienced Registered Nurse - Telehealth: Providing Remote Care Coordination and Patient Support

Remote Full-time

Experienced Substitute Teacher for Riverside County Schools - Join Scoot Education's Innovative Team

Remote Full-time

Experienced Substitute Teacher for San Bernardino County - Flexible Schedules & Competitive Pay

Remote Full-time

Experienced School Year Instructional Coach for High-Dosage Tutoring Programs in Edgewater Park, NJ

Remote Full-time

Experienced School Year Tutor for K-8 Students in Math and Literacy - Mickleton, NJ

Remote Full-time

Experienced Secondary Social Studies Teacher for Kansas - Flexible Hybrid Remote Arrangement

Remote Full-time

USPS Office Helper

Remote Full-time

Clinical Research Assistant; Live in Los Angeles and Speak Spanish

Remote Full-time

Customer Service & Live Operations Agent

Remote Full-time

Wells Fargo Jobs Des Moines $27/Hour

Remote Full-time

Insurance Account Representative - State Farm Agent Team Member

Remote Full-time

Remote- Sales Representative - Full Time

Remote Full-time

**Experienced Customer Support Representative – Data Entry and Remote Work Opportunity at arenaflex**

Remote Full-time

Remote Immigration Attorney

Remote Full-time

Experienced Call Center Representative - Patient Services Specialist - Remote Work Opportunity in Healthcare Services

Remote Full-time

STATE MEDIATOR – OJCC

Remote Full-time

Experienced Part-Time Morning Fraud Payment Specialist – Digital Payments Dispute Resolution and Fraud Prevention Expert

Remote Full-time
← Back to Home