Senior Data Engineer with NLP Expertise Needed
We are seeking a Senior Data Engineer with substantial experience in Natural Language Processing (NLP), Artificial Intelligence, and Digital Humanities. The ideal candidate will be responsible for designing and implementing data pipelines and ensuring the integrity and quality of data. A strong background in NLP algorithms and familiarity with AI tools is essential. If you are passionate about leveraging data to enhance understanding in the digital humanities, we would love to hear from you.
Scope of Work
The contractor(s) will serve as an embedded technical partner, contributing specialized skills in data processing, text extraction evaluation, metadata enrichment, and data modeling to support improved discovery and analysis of historical collections. Work will be conducted collaboratively with Museum staff, with priorities and approaches adjusted as the project evolves.
Key areas of contribution include:
•Text extraction quality assessment and improvement, including analysis of OCR outputs, identification of common transcription errors, and testing of automated or semi-automated methods for improving machine-readable text derived from digitized documents.
• Automated extraction and normalization of entity metadata (such as names of people, places, and organizations) from historical text to support improved indexing and discovery.
• Data modeling and relationship mapping across datasets, including prototyping structured representations (such as knowledge graphs) that connect people, places, events, and organizations identified within the collections.
• Evaluation and validation support, including development or adaptation of lightweight, data review and validation workflows to review OCR, entities, and relationships.
• Documentation and knowledge transfer, ensuring that methods, workflows, assumptions, and limitations are clearly captured for long-term reuse.
This scope emphasizes data exploration, quality assessment, and iterative refinement of data preparation workflows, rather than delivery of fixed production systems.
Required Skills and Capabilities
Contractor staff supporting this work should collectively demonstrate:
• Experience assessing and improving machine-generated text from OCR or similar extraction systems
• Experience applying computational methods (including machine learning where appropriate) to analyze and improve large text datasets
• Experience applying natural language processing techniques to extract entities from noisy or domain-specific historical text
• Experience developing or working with knowledge graphs and entity linking workflows
• Proficiency in Python-based data processing workflows and large-scale text analysis
Schedule:
15 Hours per week for six months.
Apply Now
Apply Now
Scope of Work
The contractor(s) will serve as an embedded technical partner, contributing specialized skills in data processing, text extraction evaluation, metadata enrichment, and data modeling to support improved discovery and analysis of historical collections. Work will be conducted collaboratively with Museum staff, with priorities and approaches adjusted as the project evolves.
Key areas of contribution include:
•Text extraction quality assessment and improvement, including analysis of OCR outputs, identification of common transcription errors, and testing of automated or semi-automated methods for improving machine-readable text derived from digitized documents.
• Automated extraction and normalization of entity metadata (such as names of people, places, and organizations) from historical text to support improved indexing and discovery.
• Data modeling and relationship mapping across datasets, including prototyping structured representations (such as knowledge graphs) that connect people, places, events, and organizations identified within the collections.
• Evaluation and validation support, including development or adaptation of lightweight, data review and validation workflows to review OCR, entities, and relationships.
• Documentation and knowledge transfer, ensuring that methods, workflows, assumptions, and limitations are clearly captured for long-term reuse.
This scope emphasizes data exploration, quality assessment, and iterative refinement of data preparation workflows, rather than delivery of fixed production systems.
Required Skills and Capabilities
Contractor staff supporting this work should collectively demonstrate:
• Experience assessing and improving machine-generated text from OCR or similar extraction systems
• Experience applying computational methods (including machine learning where appropriate) to analyze and improve large text datasets
• Experience applying natural language processing techniques to extract entities from noisy or domain-specific historical text
• Experience developing or working with knowledge graphs and entity linking workflows
• Proficiency in Python-based data processing workflows and large-scale text analysis
Schedule:
15 Hours per week for six months.
Apply Now
Apply Now