Senior MLOps Platform Engineer; S} Security Clearance

Remote Full-time
Position: Senior MLOps Platform Engineer {S} with Security Clearance
ARKA Group L.P. ("ARKA") is an advanced technologies company serving the U.S. military, intelligence community, and commercial space industry delivering next-generation solutions to support the national security space enterprise. Built on more than six decades of excellence, ARKA brings modern approaches and a culture of innovation to the challenges of today. Join the ARKA team to learn how Beyond Begins Here.

Discover your next career opportunity now!

Position Overview:

Our AI Center of Excellence builds the next generation of Agentic AI products that autonomously reason, plan, and act on behalf of our customers. To deliver these capabilities at scale, we need a platform engineering group that provides a robust, secure, and highly available MLOps foundation across both on premise clusters and AWS. The team works closely with data scientists, product engineers, and SREs to turn experimental models into reliable services that power mission critical applications.

In support of work/life balance, many positions are available for a flexible schedule within the pay period. Ask us about the opportunity for flex scheduling if that's of interest to you. Why join us
• Shape the end-to-end lifecycle of cutting-edge AI services-from model training to production inference.
• Influence architecture decisions for a hybrid cloud environment that will serve thousands of concurrent agents.
• Collaborate with world-class researchers and product teams while enjoying a strong engineering culture focused on automation, observability, and reliability. Responsibilities:
• Design, implement, and operate a unified MLOps platform that supports both on-premise Kubernetes clusters and AWS. The platform should enable rapid onboarding of new Agentic AI services and provide consistent governance across environments.
• Develop reusable CI/CD pipelines (Git Lab CI) for model packaging, containerization, automated testing, canary releases, and rollbacks.
• Build observability, monitoring, and alerting stacks (Prometheus, Grafana, Open Telemetry, Cloud Watch) to track inference latency, throughput, resource utilization, and data drift for real time and batch workloads.
• Create self-service tooling (CLI, SDKs, UI dashboards) that allows data science and product teams to register models, define inference endpoints, and manage versioning without deep Dev Ops involvement.
• Architect and maintain data pipelines that feed training data, model artifacts, and inference logs into a governed data lake (S3, on prem object store).
• Collaborate with research and product engineers to translate experimental Agentic AI prototypes into production grade services, ensuring reproducibility, security, and compliance.
• Drive performance optimization for inference workloads (GPU/CPU scaling, model quantization, batching strategies)
• Champion best practices in security (IAM, network policies, secret management), cost efficiency, and disaster recovery for the hybrid infrastructure.
• Mentor junior engineers and contribute to internal knowledge bases, upskilling, and review processes.

Required Qualifications:
• BS in computer science or related engineering field
• 5+ years of experience building and operating production grade software infrastructure, preferably in a hybrid onprem / cloud environment
• Deep expertise with Kubernetes (cluster provisioning, Helm, operators, custom resources) and container runtimes (Docker, OCI)
• Hands on experience with AWS services (EKS, Sage Maker, S3, IAM, Cloud Watch, Step Functions) and the ability to bridge onprem resources with AWS via VPN/Direct Connect
• Strong software engineering skills in Python and at least one compiled language (Go, Rust, or Java) for building platform components and SDKs
• Proficiency with CI/CD and Git Ops tooling (Argo CD, Flux, Gitlab, Git Hub Actions, or similar)
• Solid understanding of distributed systems (consensus, fault tolerance, load balancing) and experience tuning high throughput, low latency inference pipelines
• Experience with data engineering frameworks (Airflow, Prefect, Kafka, Spark, Flink) and building robust, versioned data pipelines
• Familiarity with observability stacks (Prometheus, Grafana, Open Telemetry, ELK) and the ability to define meaningful SLIs/SLOs for AI services
• Track record of collaborating with research or product teams to move prototypes to production, translating experimental code into maintainable services
• Strong problem solving mindset, excellent written and verbal communication, and a passion for building scalable AI platforms

Preferred Qualifications:
• Working knowledge of Scrum and Agile software development methodology

Location:

Remote This is a remote position that will primarily be supporting our Aurora, CO and King of Prussia, PA locations. Due to contract requirements, the job has to be performed from a remote location in the United States.

What We Offer:
• Comprehensive medical/vision/dental insurance packages
• Company contributions to qualified HSA accounts
• …

Apply Now

Apply Now
Apply Now

Similar Opportunities

Experienced Registered Behavior Technician for In-Home ABA Therapy - Atlanta, GA

Remote Full-time

Immediate Hiring: Experienced Registered Behavioral Technician (RBT) for Clinic-Based ABA Therapy Services

Remote Full-time

Experienced Registered Behavioral Technician (RBT) - ABA Therapy for Children with Autism Spectrum Disorder

Remote Full-time

Experienced Registered Nurse - Telehealth: Providing Remote Care Coordination and Patient Support

Remote Full-time

Experienced Substitute Teacher for Riverside County Schools - Join Scoot Education's Innovative Team

Remote Full-time

Experienced Substitute Teacher for San Bernardino County - Flexible Schedules & Competitive Pay

Remote Full-time

Experienced School Year Instructional Coach for High-Dosage Tutoring Programs in Edgewater Park, NJ

Remote Full-time

Experienced School Year Tutor for K-8 Students in Math and Literacy - Mickleton, NJ

Remote Full-time

Experienced Secondary Social Studies Teacher for Kansas - Flexible Hybrid Remote Arrangement

Remote Full-time

USPS Office Helper

Remote Full-time

Foundations in Education Teacher

Remote Full-time

Seasonal PT Remote Contact Center Representative- Closing Shift

Remote Full-time

Senior DevOps Engineer

Remote Full-time

Data Entry Assistant ( 100% Remote)

Remote Full-time

Associate Preconstruction Project Engineer - Battery Storage

Remote Full-time

Experienced Remote Customer Service Representative – Delivering Exceptional Support and Consultation to Foster Critical Thinking and Learning Excellence

Remote Full-time

**Experienced Remote Data Entry Specialist – Work-from-Home Opportunity with arenaflex**

Remote Full-time

Lead ROI Medical Records Specialist - Remote (After Hours - Evening Shift)

Remote Full-time

Datenbank- und Netzwerkspezialist – KI-Trainer – Amazon Store

Remote Full-time

Remote Social Media Strategist (Work from Anywhere) Job at Fud in New York

Remote Full-time
← Back to Home