Advanced Site Reliability/DevOps Engineer - (Multiple positions available)

Remote Full-time
About the position Specialize in developing scalable methods for building, deploying, and supporting cloud, on-prem and store focused enterprise services and systems. Work closely with Software Engineers to deploy and operate solutions, automate and streamline processes, build and maintain tools for deployment, perform monitoring of platform, and troubleshoot and resolve issues in all environments while guiding and mentoring other members on the team. Design and build infrastructure & systems that provide high levels of scalability, reliability, and performance for Kroger's stack, while balancing security, maintainability, reliability and operational excellence. Work with the engineering team to continuously implement and improve reliable and speedy build environments for DEV & QA, provide timely build status updates, and automate as much as possible to improve efficiency and quality. Promote innovation, outside-of-the-box thinking, teamwork, & self-organization. Ensure traceability, observability, and retrievability of system behavior. Build logging, monitoring, and alerting systems to identify bottlenecks and assist with debugging, analysis, and optimization in cloud, on-prem & store environments. Improve operational efficiency through automation and deployment or development of new tools. Experiment with and recommend new technologies that simplify or improve Kroger's stack. Craft solid and clearly explained designs, playbooks, and documentation, for consumption by teammates and the larger engineering organization. Determine methods and procedures on new assignments and may coordinate activities of other personnel. Participate in an off-hours on-call rotation, and perform periodic off-hours work during maintenance windows. Duties may be located at any Kroger Co. office throughout U.S. Telecommuting from home office is authorized pursuant to company policy. Responsibilities • Develop scalable methods for building, deploying, and supporting cloud, on-prem and store focused enterprise services and systems. • Work closely with Software Engineers to deploy and operate solutions. • Automate and streamline processes, build and maintain tools for deployment. • Perform monitoring of platform and troubleshoot and resolve issues in all environments. • Guide and mentor other members on the team. • Design and build infrastructure & systems for scalability, reliability, and performance. • Implement and improve reliable and speedy build environments for DEV & QA. • Provide timely build status updates and automate processes to improve efficiency and quality. • Promote innovation, teamwork, and self-organization. • Ensure traceability, observability, and retrievability of system behavior. • Build logging, monitoring, and alerting systems to identify bottlenecks. • Improve operational efficiency through automation and development of new tools. • Experiment with and recommend new technologies. • Craft designs, playbooks, and documentation for teammates and the engineering organization. • Determine methods and procedures on new assignments and coordinate activities of other personnel. • Participate in an off-hours on-call rotation and perform periodic off-hours work during maintenance windows. Requirements • Bachelor's Degree in Computer Science or a closely related STEM field plus at least 6 years of experience in cloud Site Reliability Engineering, DevOps, or Infrastructure OR a Master's degree in Computer Science or a closely related STEM field plus at least 3 years of experience in cloud Site Reliability Engineering, DevOps, or Infrastructure. • 3+ years of experience with message technologies such as Kafka, RabbitMQ, or SQS. • 3+ years of experience with infrastructure software tools such as Ansible or Terraform. • 3+ years of experience with containerization tools such as Docker or Kubernetes. • 3+ years of experience with CI/CD using Jenkins, Spinnaker, Azure DevOps, or TeamCity. • 3+ years of experience managing System Observability experience utilizing ELK, Datadog, New Relic, Azure Monitor, or Grafana. • 2+ years of experience implementing automation and monitoring using shell scripting and other related tools. • Any amount of experience with always-on and high-volume web server stack, Azure/GCP PaaS and Azure/Google networking, provisioning native Managed Apps & CI/CD pipelines. • Any amount of experience supporting omni-channel experiences. Apply tot his job
Apply Now

Similar Opportunities

Experienced Registered Behavior Technician for In-Home ABA Therapy - Atlanta, GA

Remote Full-time

Immediate Hiring: Experienced Registered Behavioral Technician (RBT) for Clinic-Based ABA Therapy Services

Remote Full-time

Experienced Registered Behavioral Technician (RBT) - ABA Therapy for Children with Autism Spectrum Disorder

Remote Full-time

Experienced Registered Nurse - Telehealth: Providing Remote Care Coordination and Patient Support

Remote Full-time

Experienced Substitute Teacher for Riverside County Schools - Join Scoot Education's Innovative Team

Remote Full-time

Experienced Substitute Teacher for San Bernardino County - Flexible Schedules & Competitive Pay

Remote Full-time

Experienced School Year Instructional Coach for High-Dosage Tutoring Programs in Edgewater Park, NJ

Remote Full-time

Experienced School Year Tutor for K-8 Students in Math and Literacy - Mickleton, NJ

Remote Full-time

Experienced Secondary Social Studies Teacher for Kansas - Flexible Hybrid Remote Arrangement

Remote Full-time

USPS Office Helper

Remote Full-time

Experienced Customer Service Representative for Luxury Brand Support – Remote Work Opportunity with blithequark

Remote Full-time

Senior Amazon PPC Manager

Remote Full-time

Experienced Remote Data Entry Specialist – Entry-Level Opportunity for Teens to Launch Their Career in Logistics and Supply Chain Management at blithequark

Remote Full-time

[Remote] Cyber Risk & Compliance Manager – January 2026

Remote Full-time

**Experienced Customer Care Associate – Delivering Exceptional Service to Valued Customers at blithequark**

Remote Full-time

Customer Service Representative -Operations

Remote Full-time

**Experienced Data Entry Specialist – Remote Opportunity at blithequark**

Remote Full-time

Technology Supply Chain Management Partner

Remote Full-time

Principal Engineer II - Emerging Technology

Remote Full-time

**Experienced Full Stack Data Entry Specialist – Remote Web Development and Content Creation**

Remote Full-time
← Back to Home