Python Developers - US

Remote Full-time
Work Location: Remote, within the US

Engagement Model: Freelancer/Independent Contractor

Start Date: ASAP

DataForce by TransPerfect is looking for skilled Python Developers to architect, build, and own the data pipelines that power large language model (LLM) development.

Your primary mission will be to build scalable, automated systems that transform massive raw datasets into clean, model-ready formats. While your focus will be on data engineering, your expertise will also be valuable in collaborating on model training runs and experiments.

You are a strong fit for this role if you are a Python expert who thrives on solving large-scale data challenges and enjoys working at the intersection of data engineering and machine learning.

Role Responsibilities
• Design, develop, and own robust, scalable, and automated ETL/ELT pipelines in Python to ingest and process terabyte-scale text datasets.
• Implement rigorous data cleaning, deduplication, filtering, and normalization strategies, and define and enforce data quality standards to ensure high integrity for model training.
• Efficiently structure and format diverse datasets (e.g., JSON, Parquet) for consumption by LLM training frameworks.
• Work closely with AI researchers and ML engineers to understand data requirements, define metrics, and support the model training lifecycle.
• Continuously optimize data processing workflows for performance, cost efficiency, and reliability.
• Occasionally assist with launching, monitoring, and debugging data-related issues during model training runs.

Role Requirements
• 5–10 years of professional experience in Python development, data engineering, data processing, or backend software engineering.
• Expert-level proficiency in Python and its data ecosystem (e.g., Pandas, NumPy, Dask, Polars).
• Proven experience building and maintaining large-scale data pipelines.
• Deep understanding of data structures, data modeling, and software engineering best practices (Git, CI/CD, testing).
• Experience handling and parsing diverse data formats (JSON, CSV, XML, Parquet) at scale.
• Excellent problem-solving skills and a meticulous attention to detail.
• Strong communication and collaboration skills, with experience working in a team environment.

Preferred Role Requirements
• Hands-on experience with the data preprocessing pipeline for an LLM (e.g., LLaMA, BERT, GPT-family).
• Experience with big data frameworks like Apache Spark or Ray.
• Experience with Hugging Face libraries (Transformers, Datasets, Tokenizers).
• Familiarity with ML frameworks like PyTorch or TensorFlow.
• Proficiency with cloud platforms (AWS, GCP, Azure) and their data/storage services.

DataForce by TransPerfect is part of the TransPerfect family of companies, the world’s largest provider of language and technology solutions for global business, with offices in more than 100 cities worldwide.

We offer high-quality data for Human-Machine Interaction to some of the most prestigious technology companies in the world. Our department focuses on gathering, enriching and processing data for Machine Learning in different AI domains. To learn more about DataForce please visit us at https://www.transperfect.com/dataforce.

TransPerfect provides equal employment opportunity to all individuals regardless of their race, color, creed, religion, gender, age, sexual orientation, national origin, disability, veteran status, or any other characteristic protected by state, federal, or local law. For more information on the TransPerfect Family of Companies, please visit our website at www.transperfect.com.

Remote

About the Company:
DataForce by TransPerfect

Apply Now

Apply Now
Apply Now

Similar Opportunities

Experienced Registered Behavior Technician for In-Home ABA Therapy - Atlanta, GA

Remote Full-time

Immediate Hiring: Experienced Registered Behavioral Technician (RBT) for Clinic-Based ABA Therapy Services

Remote Full-time

Experienced Registered Behavioral Technician (RBT) - ABA Therapy for Children with Autism Spectrum Disorder

Remote Full-time

Experienced Registered Nurse - Telehealth: Providing Remote Care Coordination and Patient Support

Remote Full-time

Experienced Substitute Teacher for Riverside County Schools - Join Scoot Education's Innovative Team

Remote Full-time

Experienced Substitute Teacher for San Bernardino County - Flexible Schedules & Competitive Pay

Remote Full-time

Experienced School Year Instructional Coach for High-Dosage Tutoring Programs in Edgewater Park, NJ

Remote Full-time

Experienced School Year Tutor for K-8 Students in Math and Literacy - Mickleton, NJ

Remote Full-time

Experienced Secondary Social Studies Teacher for Kansas - Flexible Hybrid Remote Arrangement

Remote Full-time

USPS Office Helper

Remote Full-time

[Remote] Sales Support and Data Coordinator

Remote Full-time

SD Solutions

Remote Full-time

Telemarketer - State Farm Agent Team Member

Remote Full-time

Advertising Sales Executive (100k per year) Job at Best Version Media USA in Mou

Remote Full-time

Disney Careers Remote ? [Remote Part-time jobs] Apply Now

Remote Full-time

Experienced Remote Data Entry Clerk - Flexible Work from Home Opportunity with blithequark for Teens and Young Adults

Remote Full-time

Podcast Producer - Scottsdale Arizona

Remote Full-time

Experienced Virtual Customer Service Associate – Delivering Exceptional Support Experience at blithequark

Remote Full-time

Sr. Field Applications Engineer (Austin, TX, US, 78763)

Remote Full-time

**Experienced Customer Service Representative – Remote Work Opportunity with arenaflex**

Remote Full-time
← Back to Home