Senior Software Engineer - Site Reliability Engineering Remote / Telecommute Jobs

Remote Full-time
Company Overview

Noctua Technology, Inc. is a software engineering and consulting corporation focused on data engineering, machine learning, and cloud technologies. We specialize in delivering premier quality software engineering solutions to Public Sector and Commercial customers across the US.

Department Overview

The Site Reliability Engineering discipline at Noctua Technology, Inc is a strategic force driving digital transformation. We treat operations as a software engineering challenge, focusing on the seamless integration, scalability, and long-term reliability of cloud native systems. Our SREs don’t just manage infrastructure; they build it using Infrastructure as Code (IaC), monitor it through advanced observability stacks, and protect it by engineering for failure. We work closely with clients to bridge the gap between development and operations.

Job Summary

We are seeking a highly experienced and autonomous Senior Site Reliability Engineer (SRE) to join our dynamic team. As a technical leader, you will define the strategy and apply advanced software engineering principles to operations, focusing on the architecture, reliability, and long-term performance of large-scale production systems. You will play a crucial role in reducing toil through automation, defining and monitoring Service Level Objectives (SLOs), and implementing best practices for system stability and incident response. This role requires working with modern cloud technologies to ensure the high availability and efficiency of applications and infrastructure.

Security Clearance Requirement: Applicants must be US citizens and eligible to obtain and maintain an active Secret security clearance or above.

Key Responsibilities

● Site Reliability Engineering

β—‹ Drive the definition and adoption of SLIs and SLOs across multiple services or entire platforms, ensuring alignment with business goals.

β—‹ Design and architect Infrastructure as Code (IaC) solutions for large-scale, complex environments, establishing standards and best practices.

β—‹ Implement and manage containerized and serverless architectures using Docker, Kubernetes, and cloud-native services, focusing on performance and error budgets.

β—‹ Build and maintain reliable and self-healing CI/CD pipelines to automate deployments and improve development workflows.

● Toil Reduction and Incident Management

β—‹ Implement and refine comprehensive monitoring, alerting, and logging to detect and address performance and availability issues proactively.

β—‹ Lead the strategic effort to eliminate toil, identifying and championing major automation projects that deliver significant organizational efficiency.

β—‹ Lead high-severity incident response and coordinate blameless postmortems for major outages, driving the resulting remediation and systemic improvements.

● Testing and Service Resiliency

β—‹ Implement cloud security best practices, including identity and access management (IAM), encryption, and compliance controls.

β—‹ Proactively identify and address system weaknesses and ensure performance under stress.

β—‹ Support disaster recovery and high availability strategies through backup and failover planning.

● Collaboration and Knowledge Sharing

β—‹ Serve as a primary SRE liaison for development teams, influencing application architecture and design to meet reliability and scalability targets from inception.

β—‹ Create and maintain documentation for cloud architectures, deployment processes, and best practices.

β—‹ Contribute to internal knowledge-sharing initiatives, ensuring continuous learning within the team.

● Stakeholder Communication

β—‹ Act as a subject matter expert and trusted advisor to clients and internal leadership on cloud infrastructure, reliability strategy, and Service Level Agreement (SLA) negotiations.

β—‹ Act on client feedback to refine and enhance cloud solutions.

β—‹ Conduct training and knowledge-sharing sessions to help clients manage their cloud environments effectively.

● Continuous Learning and Innovation

β—‹ Stay updated on the latest developments in cloud infrastructure and technology trends.

β—‹ Drive innovation by proposing and implementing new techniques and technologies.

Qualifications

● 5+ years of experience in site reliability engineering, cloud engineering, or related fields.

● Strong software engineering skills with an emphasis on writing clean, modular, and maintainable code, specifically for automation and system management.

● Deep experience with Infrastructure as Code (IaC) tools like Terraform or CloudFormation.

● Deep experience with containerization and orchestration tools like Docker and Kubernetes.

● Deep knowledge of networking concepts, cloud security best practices, and identity management.

● Experience with programming or scripting languages such as Python, Bash, or Go.

● Experience with CI/CD pipelines and DevOps methodologies.

● Strong problem-solving skills and the ability to troubleshoot complex cloud environments.

● Demonstrated ability to influence technical decision-making across organizational boundaries

Preferred qualifications:

● Bachelor's or advanced degree in Computer Science or a related field.

● Any of the below cloud certifications:

β—‹ Google Cloud Professional Cloud Architect

β—‹ Google Cloud Professional Cloud DevOps Engineer

β—‹ AWS Certified Solutions Architect

β—‹ AWS Certified Developer

β—‹ AWS Certified SysOps Administrator

Apply tot his job

Apply To this Job
Apply Now

Similar Opportunities

Experienced Registered Behavior Technician for In-Home ABA Therapy - Atlanta, GA

Remote Full-time

Immediate Hiring: Experienced Registered Behavioral Technician (RBT) for Clinic-Based ABA Therapy Services

Remote Full-time

Experienced Registered Behavioral Technician (RBT) - ABA Therapy for Children with Autism Spectrum Disorder

Remote Full-time

Experienced Registered Nurse - Telehealth: Providing Remote Care Coordination and Patient Support

Remote Full-time

Experienced Substitute Teacher for Riverside County Schools - Join Scoot Education's Innovative Team

Remote Full-time

Experienced Substitute Teacher for San Bernardino County - Flexible Schedules & Competitive Pay

Remote Full-time

Experienced School Year Instructional Coach for High-Dosage Tutoring Programs in Edgewater Park, NJ

Remote Full-time

Experienced School Year Tutor for K-8 Students in Math and Literacy - Mickleton, NJ

Remote Full-time

Experienced Secondary Social Studies Teacher for Kansas - Flexible Hybrid Remote Arrangement

Remote Full-time

USPS Office Helper

Remote Full-time

Experienced Remote Customer Service Representative for Dynamic Team – Up to $19/Hour, No Degree Required, Flexible Scheduling

Remote Full-time

Associate Cybersecurity Analyst - FULLY REMOTE in North Carolina

Remote Full-time

Project management and Personal assistant

Remote Full-time

**Experienced Manager, Customer Retention & Insights – Data-Driven Customer Experience Expert**

Remote Full-time

Experienced 3rd Shift Customer Service Representative – Retail and Sales Expertise for a Dynamic Store Environment at blithequark

Remote Full-time

Experienced Data Entry Professional – Remote Opportunity for Detail-Oriented Individuals with a Passion for Data Management and Process Improvement

Remote Full-time

Development Associate

Remote Full-time

Experienced Full Stack Customer Service Representative - Remote Work Opportunity with blithequark

Remote Full-time

Medicare Billing Specialist

Remote Full-time

Senior/Medical Science Liaison - Advanced Lung Disease (FL)

Remote Full-time
← Back to Home