Develop Existing NLP Legal Technology Software

Remote Full-time
Our software is a Python/Flask backend that identifies business-development-ready federal litigation opportunities from CourtListener and generates structured lead briefs and export bundles. The system detects high-signal discovery triggers (e.g., MTD denied, MTC granted, ESI protocol, 502(d), Rule 26(f)) from CourtListener HTML docket entries and PDF dockets (both are equally important), then enriches results via strict-JSON LLM summaries, scoring, and data-scope estimation. It uses SQLite caching and produces CSV/JSON/NDJSON/ZIP outputs.

We’re looking for someone to materially improve trigger detection accuracy, with a strong emphasis on reducing false negatives (missed triggers) across both HTML docket text and PDF-extracted text/OCR.

We already have a labeled dataset:

~9 discovery triggers

~100 labeled docket PDFs per trigger

For each trigger: ~50 true positives + ~50 false-positive samples (ground truth labels)

- Responsibilities:

1) Improve text extraction + normalization

Audit PDF extraction quality and upgrade the hybrid pipeline (native text + OCR fallback).

Normalize docket artifacts (pagination, headers/footers, spacing, line wraps, table-like formatting).

Add extraction “confidence” signals to drive fallbacks and debugging.

2) Upgrade trigger detection (recall-forward)

Strengthen existing regex/heuristic patterns to capture more true positives.

Add context-sensitive logic (windowing, docket-entry boundaries, negative patterns, temporal phrasing).

Implement disambiguation (e.g., “denied” vs “recommended denial”; “filed” vs “denied”; “granted in part”).

Ensure improvements apply to both HTML and PDF sources.

3) Build a professional evaluation + error analysis loop

Produce reproducible runs with per-trigger precision / recall / F1 and confusion breakdowns.

Create “miss analysis” tooling: for each false negative, show why it didn’t fire + suggested rule updates.

Add regression tests (pytest) to prevent future detection drift.

4) Integrate cleanly into the existing Flask codebase

Contribute PRs with clear documentation and maintainable structure.

Keep performance reasonable (avoid over-OCR; cache smartly; minimize repeated parsing).

- Success criteria (what we will measure):

Demonstrated improvement in recall (primary) and overall F1 on the labeled dataset.

Reduced top false-negative root causes (extraction failures, phrasing variants, formatting artifacts).

A maintainable trigger framework: easier to add triggers and safer to iterate.

- Required experience:

Strong Python backend engineering (clean code, tests, reproducibility).

Prior experience with PDF parsing + OCR workflows (hybrid approaches strongly preferred).

Experience building or tuning rule-based NLP / text classification systems with evaluation harnesses.

Comfort with Flask + SQLite + pandas outputs + pytest.

- Nice to have:

Legal-tech / docket familiarity.

Experience designing labeling workflows and active error mining.

Experience with LLM structured outputs (strict JSON) + validation/normalization layers.

- Expected Engagement:

Contract / freelance, milestone-based preferred. Current listed budget is flexible, we are open to negotiation.

Async-friendly.

Potential extension after initial accuracy lift.

Apply Now

Apply Now
Apply Now

Similar Opportunities

Experienced Registered Behavior Technician for In-Home ABA Therapy - Atlanta, GA

Remote Full-time

Immediate Hiring: Experienced Registered Behavioral Technician (RBT) for Clinic-Based ABA Therapy Services

Remote Full-time

Experienced Registered Behavioral Technician (RBT) - ABA Therapy for Children with Autism Spectrum Disorder

Remote Full-time

Experienced Registered Nurse - Telehealth: Providing Remote Care Coordination and Patient Support

Remote Full-time

Experienced Substitute Teacher for Riverside County Schools - Join Scoot Education's Innovative Team

Remote Full-time

Experienced Substitute Teacher for San Bernardino County - Flexible Schedules & Competitive Pay

Remote Full-time

Experienced School Year Instructional Coach for High-Dosage Tutoring Programs in Edgewater Park, NJ

Remote Full-time

Experienced School Year Tutor for K-8 Students in Math and Literacy - Mickleton, NJ

Remote Full-time

Experienced Secondary Social Studies Teacher for Kansas - Flexible Hybrid Remote Arrangement

Remote Full-time

USPS Office Helper

Remote Full-time

Southwest Airlines Remote Work From Home Jobs - No Experience

Remote Full-time

Costco Remote Positions

Remote Full-time

Experienced Online Data Entry Specialist – Flexible Remote Work Opportunity with arenaflex for Steady Income and Professional Growth

Remote Full-time

[Remote] Azure Cloud Native Software Engineer

Remote Full-time

Software Developer-Amazon Los Angeles

Remote Full-time

Flexible Consumer Experience Analyst (Hiring Immediately)

Remote Full-time

Customer Retention Representative

Remote Full-time

[Remote] Senior Consultant, HCM Benefits

Remote Full-time

Part-Time Online High School Teaching Assistant (U.S. Curriculum)

Remote Full-time

**Experienced Online Chinese Language Teacher – Immersive Language Instruction for Children and Adults**

Remote Full-time
← Back to Home