Senior Data Engineer with NLP Expertise Needed

Remote Full-time

We are seeking a Senior Data Engineer with substantial experience in Natural Language Processing (NLP), Artificial Intelligence, and Digital Humanities. The ideal candidate will be responsible for designing and implementing data pipelines and ensuring the integrity and quality of data. A strong background in NLP algorithms and familiarity with AI tools is essential. If you are passionate about leveraging data to enhance understanding in the digital humanities, we would love to hear from you.

Scope of Work

The contractor(s) will serve as an embedded technical partner, contributing specialized skills in data processing, text extraction evaluation, metadata enrichment, and data modeling to support improved discovery and analysis of historical collections. Work will be conducted collaboratively with Museum staff, with priorities and approaches adjusted as the project evolves.

Key areas of contribution include:

•Text extraction quality assessment and improvement, including analysis of OCR outputs, identification of common transcription errors, and testing of automated or semi-automated methods for improving machine-readable text derived from digitized documents.

• Automated extraction and normalization of entity metadata (such as names of people, places, and organizations) from historical text to support improved indexing and discovery.

• Data modeling and relationship mapping across datasets, including prototyping structured representations (such as knowledge graphs) that connect people, places, events, and organizations identified within the collections.

• Evaluation and validation support, including development or adaptation of lightweight, data review and validation workflows to review OCR, entities, and relationships.

• Documentation and knowledge transfer, ensuring that methods, workflows, assumptions, and limitations are clearly captured for long-term reuse.

This scope emphasizes data exploration, quality assessment, and iterative refinement of data preparation workflows, rather than delivery of fixed production systems.

Required Skills and Capabilities

Contractor staff supporting this work should collectively demonstrate:

• Experience assessing and improving machine-generated text from OCR or similar extraction systems

• Experience applying computational methods (including machine learning where appropriate) to analyze and improve large text datasets

• Experience applying natural language processing techniques to extract entities from noisy or domain-specific historical text

• Experience developing or working with knowledge graphs and entity linking workflows

• Proficiency in Python-based data processing workflows and large-scale text analysis

Schedule:

15 Hours per week for six months.

Apply Now

Apply Now

Apply Now

Senior Data Engineer with NLP Expertise Needed

Similar Opportunities

Experienced Registered Behavior Technician for In-Home ABA Therapy - Atlanta, GA

Immediate Hiring: Experienced Registered Behavioral Technician (RBT) for Clinic-Based ABA Therapy Services

Experienced Registered Behavioral Technician (RBT) - ABA Therapy for Children with Autism Spectrum Disorder

Experienced Registered Nurse - Telehealth: Providing Remote Care Coordination and Patient Support

Experienced Substitute Teacher for Riverside County Schools - Join Scoot Education's Innovative Team

Experienced Substitute Teacher for San Bernardino County - Flexible Schedules & Competitive Pay

Experienced School Year Instructional Coach for High-Dosage Tutoring Programs in Edgewater Park, NJ

Experienced School Year Tutor for K-8 Students in Math and Literacy - Mickleton, NJ

Experienced Secondary Social Studies Teacher for Kansas - Flexible Hybrid Remote Arrangement

USPS Office Helper

Remote Associate Attorney (Cybersecurity & Data Breach)

Experienced Part-Time Sales Enablement & Customer Success Student - Remote in Germany

Customer Service Representative

Lead Cashier

Sales Chat Agent

Key Account Manager

Director, Marketing Partnerships

Adjunct Instructor - Sociology (Pool)

[PART_TIME Remote] Software Engineer, Consensus - (Remote in USA)

Director, HEOR Modeling