NLP Engineer (Remote)
ROLE SUMMARYWe are hiring a hands-on NLP Engineer to build robust pipelines that convert policy, regulatory, fintech, and healthcare documents into structured, graph-ready data. You will own the full extraction lifecycle from raw text to clean, schema-validated outputs using classical NLP, deep learning, and LLM APIs.KEY RESPONSIBILITIES- Pipeline Development: Design and build end-to-end text extraction pipelines for policy, regulatory, fintech, and healthcare documents- Entity & Clause Extraction: Extract key entities (countries, companies, minerals) and structure policy clauses and obligations- Deep Learning & Transformers: Fine-tune BERT / RoBERTa for NER, text classification, and relation extraction tasks- LLM Integration: Leverage LLM APIs with structured output extraction, prompt engineering, and tool/function calling- Data Engineering: Build scalable Python pipelines for high-volume document processing with robust pre-processing for PDF, DOCX, and HTML- Schema & Graph Readiness: Define and enforce JSON schemas; ensure outputs are clean and compatible with knowledge graph ingestion- Accuracy Improvement: Evaluate model performance, track metrics, and implement feedback loops to improve extraction quality over timeREQUIRED SKILLS- 3–5 years hands-on NLP engineering real production pipelines, not just model experiments- Strong Python skills: OOP, async programming, packaging, and testing- NLP frameworks: spaCy, HuggingFace Transformers, NLTK- Deep learning: fine-tuning transformer models for sequence labeling and classification- LLM API integration: prompt engineering, structured outputs, and function/tool calling- Data pipeline experience: ETL, batch processing, and text pre-processing at scale- JSON schema design and validation using pydantic or json schemaGOOD TO HAVE- Experience with legal, regulatory, or policy documents (contracts, compliance filings, government publications)- Familiarity with knowledge graphs or graph databases (Neo4j, RDF)- Document parsing tools: pdfplumber, Docling, Apache Tika- Domain knowledge in fintech or healthcare NLP- Exposure to information extraction benchmarks (CoNLL, DocRED, SciERC)
Apply Now
Apply Now