Tool Use Expert

Remote Full-time
This description is a summary of our understanding of the job description. Click on 'Apply' button to find out more. Role Description Mercor is partnering with an AI research organization to engage independent evaluation contractors who can assess agentic tool-use quality—specifically whether a model calls search appropriately and rewrites user prompts into effective queries. This short term engagement focuses on high-accuracy judgments, clear rationales, and consistency across a large volume of model–rater traces. The work is well-suited for experts in information retrieval, prompt engineering, and product QA who prefer remote, asynchronous projects. Key Responsibilities Review model interaction logs and decide if invoking the search tool was appropriate given the initial prompt and context. Evaluate the rewritten search query for clarity, specificity, and fidelity to the user’s intent. Provide concise, evidence-based rationales tied to rubric criteria; label edge cases and ambiguities. Score query quality (e.g., intent capture, keyword selection, operator use) and overall tool-use timing. Calibrate against gold examples; surface rubric gaps and propose improvements. Track decisions in a task portal; maintain high inter-rater agreement and throughput targets. Flag potentially sensitive content according to provided safety guidelines. Qualifications Excellent written communication; able to justify decisions succinctly with references to instructions/rubrics. Meticulous attention to detail; comfort working independently with minimal oversight. Nice to have: familiarity with annotation tools, basic scripting (Python/SQL), and multilingual proficiency. Requirements Remote and asynchronous—contractors set their own hours. Expected commitment: ~10–20 hours/week; flexible, project-based workload. Duration: initial 6–10 weeks with potential for additional task batches. Resource sharing and best-practice guides provided; support team available for inquiries. Compensation & Contract Terms Compensation for completed work: estimated $45/hour equivalent or calibrated per-task rates based on complexity and geography (final rates confirmed before work begins). Payments for services rendered via platform (e.g., weekly through Stripe Connect, where available). Independent contractor engagement; project-based statement of work; no employment relationship or benefits implied. Application Process Submit a brief profile (CV or LinkedIn) and note relevant evaluation/search experience. Complete a short skills check and sample grading exercise to demonstrate rubric alignment. If matched, you’ll sign a simple contract/NDA and receive task access details. Typical follow-up within a few days after the sample review. Company Description Mercor is a talent marketplace connecting experts with leading AI labs and research groups. Backed by Benchmark, General Catalyst, Adam D’Angelo, Larry Summers, and Jack Dorsey. Thousands of professionals across domains—research, engineering, law, and creative—partner with Mercor on frontier AI projects.
Apply Now

Similar Opportunities

Experienced Registered Behavior Technician for In-Home ABA Therapy - Atlanta, GA

Remote Full-time

Immediate Hiring: Experienced Registered Behavioral Technician (RBT) for Clinic-Based ABA Therapy Services

Remote Full-time

Experienced Registered Behavioral Technician (RBT) - ABA Therapy for Children with Autism Spectrum Disorder

Remote Full-time

Experienced Registered Nurse - Telehealth: Providing Remote Care Coordination and Patient Support

Remote Full-time

Experienced Substitute Teacher for Riverside County Schools - Join Scoot Education's Innovative Team

Remote Full-time

Experienced Substitute Teacher for San Bernardino County - Flexible Schedules & Competitive Pay

Remote Full-time

Experienced School Year Instructional Coach for High-Dosage Tutoring Programs in Edgewater Park, NJ

Remote Full-time

Experienced School Year Tutor for K-8 Students in Math and Literacy - Mickleton, NJ

Remote Full-time

Experienced Secondary Social Studies Teacher for Kansas - Flexible Hybrid Remote Arrangement

Remote Full-time

USPS Office Helper

Remote Full-time

Experienced Market Research Manager - Driving Consumer Insights & Strategic Business Decisions in Entertainment, Technology, and Lifestyle

Remote Full-time

**Experienced Part-Time Customer Service Representative – Work From Home Opportunity at arenaflex**

Remote Full-time

Senior Full Stack Engineer for CRM Customer Acquisitions and Digital Transformation

Remote Full-time

Senior Distinguished Engineer - Card Tech; Remote-Eligible

Remote Full-time

**Experienced Full Stack Customer Support Specialist – Email & Live Chat Support | Flexible Remote Work Opportunity**

Remote Full-time

General Liability Inside Claim Representative

Remote Full-time

Business Development Executive, GTS Midsize Enterprise

Remote Full-time

Software Developer, Generative AI

Remote Full-time

Integration Specialist Nurse Educator - Northeast Region

Remote Full-time

Implementation Specialist – Guest Engagement | Olo | $55k-$78k | Remote (USA)

Remote Full-time
← Back to Home