Senior Airflow Reliability Engineer – Infra

Remote Full-time
Astronomer empowers data teams to bring mission-critical software, analytics, and AI to life and is the company behind Astro, the industry-leading unified DataOps platform powered by Apache Airflow®. Astro accelerates building reliable data products that unlock insights, unleash AI value, and powers data-driven applications. Trusted by more than 800 of the world’s leading enterprises, Astronomer lets businesses do more with their data. To learn more, visit www.astronomer.io.About this roleThe Astronomer Customer Reliability Engineering (CRE) team is responsible for the success of our customers’ usage of our managed Airflow service.The CREs are responsible for operating, monitoring, and maintaining the platform to ensure availability, predictability, and reliable operations.As a senior infrastructure specialist within the team, you will focus on the reliability of the underlying cloud infrastructure and Kubernetes clusters. This entails responding to incidents either raised by a customer, or from our monitoring system and then taking further steps to ensure problems are permanently resolved or monitored. As owners of the observability platform, CRE has unlimited potential to improve the reliability of the product and deliver the best possible outcome for our customers.This role is directly customer-facing and gives exposure to very diverse problems and requirements. CRE get the opportunity to interface with customers from a variety of industries across different cloud providers, and all with different expectations. Your contributions will directly impact customers’ success with using the Astronomer products, and you will be able to help make meaningful improvements to the customer experience.What you get to do:Provide solutions to customers to make them successful using our products.Troubleshoot customer environments and engage in active triaging with customersParticipate in on-call rotation for weekend coverageProvide feedback to the product development teams on customer needs and pain points.Build out our monitoring and alerting systems.Build and maintain automation to ensure daily operational tasks are handled as efficiently as possible. Help direct the architecture of the products and contribute where possible.Own the customer experience, working directly with customers to prioritize and solve issues, meet SLAs, and provide “white glove” guidance on the path to production.Participate remotely within a fully distributed team.Enhance and enrich customer documentationWork with the latest technology and multi-cloud implementationsWhat you bring to the role:6 years of experience, preferably with large, complex cloud infrastructures operating at scale4 years of experience with KubernetesExperience managing a Production distributed system with at least one major cloud provider (one or all: AWS, GCP, Azure)Strong Linux experienceKnowledge of how to operate and monitor issues for distributed systems Previous experience in handling customers issues (internal or external) Strong communication skillsDevOps or CI/CD experiencePython scriptingGood troubleshooting Skills Bonus points if you have:Experience as a Site Reliability EngineerWorked with Kubernetes Custom ResourcesDepth of knowledge with AzureAirflow/Big Data Orchestration experienceIaC experience#LI-Fulltime#LI-RemoteAt Astronomer, we value diversity. We are an equal opportunity employer: we do not discriminate on the basis of race, religion, color, national origin, gender, sexual orientation, age, marital status, veteran status, or disability status.

Apply Now
Apply Now

Similar Opportunities

Experienced Registered Behavior Technician for In-Home ABA Therapy - Atlanta, GA

Remote Full-time

Immediate Hiring: Experienced Registered Behavioral Technician (RBT) for Clinic-Based ABA Therapy Services

Remote Full-time

Experienced Registered Behavioral Technician (RBT) - ABA Therapy for Children with Autism Spectrum Disorder

Remote Full-time

Experienced Registered Nurse - Telehealth: Providing Remote Care Coordination and Patient Support

Remote Full-time

Experienced Substitute Teacher for Riverside County Schools - Join Scoot Education's Innovative Team

Remote Full-time

Experienced Substitute Teacher for San Bernardino County - Flexible Schedules & Competitive Pay

Remote Full-time

Experienced School Year Instructional Coach for High-Dosage Tutoring Programs in Edgewater Park, NJ

Remote Full-time

Experienced School Year Tutor for K-8 Students in Math and Literacy - Mickleton, NJ

Remote Full-time

Experienced Secondary Social Studies Teacher for Kansas - Flexible Hybrid Remote Arrangement

Remote Full-time

USPS Office Helper

Remote Full-time

Program Manager, ERP Systems, Hybrid

Remote Full-time

Director, Customer Intelligence – Data Science, Machine Learning

Remote Full-time

Work From Home Ironer ID-2005 – Amazon Store

Remote Full-time

Respiratory Therapist

Remote Full-time

**Experienced Customer Service Representative – Health Plan, Remote Oregon Opportunity**

Remote Full-time

Experienced Online Data Entry Specialist – Part-Time Flexible Opportunity for College Students to Gain Professional Experience

Remote Full-time

National Sales Director

Remote Full-time

CA Pharmacy Manager- Store 03932

Remote Full-time

Experienced Customer Service Representative – Sales Support and Account Management for a Dynamic Aviation Industry Leader

Remote Full-time

Brand Designer

Remote Full-time
← Back to Home