Maximum of 25 job preferences reached.
Top Reliability Engineer Jobs in Calgary
Reposted 2 Days AgoSaved
Easy Apply
Easy Apply
Cloud • Security • Software • Cybersecurity • Automation
As an Intermediate Site Reliability Engineer in Environment Automation, you'll automate operations across many GitLab environments, maintain infrastructure reliability using Kubernetes, and enhance IT practices with Terraform and Ansible, while collaborating with senior engineers.
Top Skills:
AnsibleCloud ServicesDevsecopsGitlabGoInfrastructure As CodeKubernetesTerraform
Artificial Intelligence • Consumer Web • Digital Media • Information Technology • Social Impact • Software
Lead SRE work to keep Circle highly available and performant: respond to incidents, own monitoring/alerting/log management, manage and optimize MySQL/Postgres/ClickHouse/Redis databases, maintain server infrastructure and deployment pipelines, collaborate with engineering teams, and build internal SRE tooling and automation.
Top Skills:
AWSClickhouseKubernetesLlm-Based Tools (Copilots)MySQLPostgresRedis
Blockchain • eCommerce • Fintech • Payments • Software • Financial Services • Cryptocurrency
The Senior Site Reliability Engineer will enhance reliability of Block's platform, improve incident response using AI tools, and coordinate incident management. Responsibilities include building reliable systems, standardizing tools, and leading high-severity incidents during on-call rotations.
Top Skills:
Amazon Web ServicesDatadogDynamoDBGrpcHTTPIstioJavaJSONKotlinKubernetesLaunchdarklyMySQLProtocol BuffersTerraformVitess
Insurance
As a Reliability Engineer, you'll design, implement, and maintain AWS cloud environments, ensuring systems' reliability and performance, while enhancing monitoring and incident response capabilities.
Top Skills:
AWSNginxPythonUnixWindows
Artificial Intelligence • Software • Generative AI
The Founding Platform & Reliability Engineer will design and operate reliable, scalable infrastructure for an AI storytelling platform, involving hands-on implementation and strategic decision-making.
Top Skills:
AmplitudeAWSCloud RunFirebaseGCPModalNext.JsNode.jsPythonReactRedisSentryTypescriptUpstash
Artificial Intelligence • Big Data • Healthtech • Machine Learning • Analytics • Biotech • Generative AI
The Site Reliability Engineer will manage cloud infrastructure, automate tasks, collaborate in agile teams, and ensure service reliability and quality.
Top Skills:
Aurora MysqlAWSAzureBashDockerGCPGoKubernetesPostgresPythonRubyTerraform
Artificial Intelligence • Cloud • Consumer Web • Productivity • Software • App development • Data Privacy
The Staff Site Reliability Engineer will develop Dropbox's reliability strategy, enhance operational excellence, and lead cross-team initiatives. Responsibilities include improving monitoring and incident response systems, mentoring engineers, and aligning stakeholders on reliability priorities.
Top Skills:
Ai-Enabled Software DeliveryDebugging ToolsDistributed SystemsIncident ResponseObservability
Information Technology • Software
The Reliability Engineer will develop and implement reliability test plans for IVD medical devices, conduct analyses, and lead cross-functional projects while ensuring compliance with regulatory standards.
Top Skills:
JmpMatlabMinitabPythonR
eCommerce • Payments • Software
The role focuses on ensuring the reliability, scalability, and operational maturity of production MySQL databases. Responsibilities include managing database operations, improving automation, and collaborating with engineering teams to enhance performance and troubleshoot issues.
Top Skills:
AnsibleBashCi/CdDockerGCPGoJavaScriptKubernetesMySQLPythonTerraform
Artificial Intelligence • Cloud • Information Technology • Software
Contribute to the reliability and performance of Mithril's GPU orchestration platform through automation, observability, and infrastructure management. Collaborate with the team to ensure scalability across multi-cloud environments while maintaining systems stability and implementing SLOs.
Top Skills:
AWSAzureGCPGoGrafanaKubernetesLinuxOpentelemetryPrometheusPulumiPythonTcp/IpTerraform
Artificial Intelligence • Software
The Data Reliability Engineer will enhance the resilience and scalability of data infrastructure, focusing on automation and reliability. Responsibilities include managing data pipelines, operating Kubernetes clusters, and defining observability standards.
Top Skills:
GrafanaKubernetesPrometheusPythonRayTerraform
Artificial Intelligence • Software
As a Software Engineer in Reliability, you'll architect and manage multi-cloud GPU infrastructure, ensuring performance, security, and scale while debugging complex hardware/software issues.
Top Skills:
AmdAWSBashGoGpuInfinibandLinuxNvidiaOciPythonRdma
New
Track Smarter, Apply Better.
Ditch the spreadsheets. Organize your job search with our freeApplication Tracker.
Use For Free
Blockchain • Financial Services • Cryptocurrency • Web3
Design, build, and operate the infrastructure and platform services that power AI agent workflows in production. Ensure reliability, scalability, observability, and security of model-serving, orchestration, and compute layers. Build APIs, SDKs, IaC (Terraform), CI/CD pipelines, monitoring and incident-response processes, and collaborate with AI, data, and engineering teams to productionize agent prototypes.
Top Skills:
AWSBashCi/CdDockerKubernetesPythonTerraform
3 Days AgoSaved
Agency • Information Technology • Professional Services • Software
Lead development and implementation of preventive and predictive maintenance for onshore mechanical equipment, use CMMS to plan and monitor maintenance, analyze reliability data, perform RCA, support operations and maintenance teams, ensure safety and compliance, and recommend improvements to reduce downtime and costs.
Top Skills:
CmmsPredictive MaintenancePreventive MaintenanceRoot Cause Analysis
3 Days AgoSaved
Agency • Information Technology • Professional Services • Software
Lead development and implementation of preventive and predictive maintenance programs for offshore mechanical equipment, use CMMS to plan and track work, perform RCA for failures, support offshore teams in troubleshooting, monitor equipment reliability, and ensure compliance with safety and maintenance standards.
Top Skills:
CmmsPredictive MaintenancePreventive MaintenanceRoot Cause Analysis
Cloud • Software
Operate, maintain and improve the global Tyk Cloud platform: run production Kubernetes clusters, manage cloud infrastructure, automate operations, run on-call incident response, create monitoring and dashboards, conduct post-incident analysis, document SRE processes, and drive reliability, efficiency and multi-region/multi-cloud expansion.
Top Skills:
AWSAzureContainersDnsEksGCPGoGrafanaHelmHTTPInfrastructure As Code (Iac)KubernetesLinuxLogging Collection And Analysis SystemsMongoDBPrometheusPythonRancherRedisTcp/IpTerraformThanosTlsUdp
Cloud • Security • Software • Generative AI
Lead engineering initiatives to automate and scale Elastic's multi-cloud platform. Build and maintain software, tooling, and automations for reliability; manage Kubernetes at scale; respond to major incidents and drive problem management; collaborate across distributed teams and participate in a follow-the-sun on-call rotation to prevent customer impact.
Top Skills:
CrossplaneDockerElastic CloudElastic StackGoInfluxdbInfrastructure-As-CodeKubernetesLinuxPrometheusServerlessTerraform
Artificial Intelligence • Cloud • Social Impact • Software • Wearables
Senior SRE focused on building cloud-native platforms, testable automation, and reliability tooling. Partner with Identity and Security to strengthen authentication/authorization, Okta integrations, and compliance. Design tests, write maintainable code (Go/Python), and improve observability and operational practices.
Top Skills:
AksApmAWSAzureC#Ci/CdEksGoIacInfrastructure As CodeJavaKubernetesLoggingMetricsObservability ToolsOidcOktaPythonSAMLSecrets ManagementTracing
Software
Operate and maintain production AWS/EKS Kubernetes clusters; design and ship infrastructure-as-code with Terraform; manage Helm charts and ArgoCD GitOps for multi-region SaaS; maintain observability (Grafana, alerting, logs); improve CI/CD pipelines; remediate container and infrastructure CVEs; support compliance (FedRAMP/SOC2/NIST); create runbooks and lead incident response and post-incident reviews.
Top Skills:
Amazon EksArgocdAWSCi/CdClaudeDockerGitopsGrafanaHelmKubernetesTerraform
Internet of Things
Operate and evolve an EKS-based Kubernetes platform, design CI/CD pipelines (GitHub Actions, OIDC), maintain infra-as-code (Pulumi/Terraform/OpenTofu) across AWS accounts, run observability stack, enforce security best practices, diagnose incidents and lead postmortems, participate in on-call rotation, and produce runbooks and documentation.
Top Skills:
Amazon EksAWSAws IamAws Secrets ManagerExternal Secrets OperatorGithub ActionsGrafanaKubernetesOidcOpentofuPulumiTerraformVectorVictorialogsVictoriametrics
Artificial Intelligence • Fintech • Software • Financial Services
The SRE will own reliability for a cloud-native platform, optimizing performance, availability, and observability, while mentoring engineering teams.
Top Skills:
AWSClickhouseGoKafkaKubernetesPulumiPythonTerraform
Cloud • Software
The Senior Site Reliability / Gitops Engineer will drive automation and collaboration within the IS team, enhancing Canonical's IT operations and services while managing infrastructure as code and cloud technologies.
Top Skills:
Cloud ComputingDockerElasticsearchGitopsGrafanaIacKubernetesLinuxPrometheusPython
Cloud • Software
As a Site Reliability / Gitops Engineer, you will automate operations, develop Infrastructure as Code, maintain core services, and collaborate on service architecture.
Top Skills:
Ci/CdCloud ComputingElasticsearchGrafanaInfrastructure As CodeLinuxPrometheusPython
Cloud • Software
The Site Reliability Engineer will ensure reliable cloud operations by applying Python for infrastructure automation, managing OpenStack and Kubernetes, and practicing devsecops in a fast-paced environment.
Top Skills:
KubernetesLinuxOpenstackPython
Cloud • Software
The Senior Site Reliability Engineer will automate operations using Python, manage Kubernetes and OpenStack clusters, and ensure high availability for enterprise infrastructures.
Top Skills:
KubernetesLinuxOpenstackPython
Let Your Resume Do The Work
Upload your resume to be matched with jobs you're a great fit for.
Success! We'll use this to further personalize your experience.
Top Calgary Companies Hiring Reliability Engineers
See AllPopular Job Searches
Tech Jobs & Startup Jobs in Calgary
Remote Jobs in Calgary
Hybrid Jobs in Calgary
Account Executive Jobs in Calgary
Account Manager Jobs in Calgary
Accounting Jobs in Calgary
AI Jobs in Calgary
Analyst Jobs in Calgary
Analytics Jobs in Calgary
Automation Engineer Jobs in Calgary
AWS Jobs in Calgary
Azure Jobs in Calgary
Business Analyst Jobs in Calgary
Business Development Jobs in Calgary
Cloud Jobs in Calgary
Communications Jobs in Calgary
Content Writer Jobs in Calgary
Controller Jobs in Calgary
Copywriting Jobs in Calgary
Customer Service Jobs in Calgary
Customer Service Manager Jobs in Calgary
Cyber Security Jobs in Calgary
Data Analyst Jobs in Calgary
Data Engineer Jobs in Calgary
Data Jobs in Calgary
Data Science Jobs in Calgary
Database Administrator Jobs in Calgary
Design Jobs in Calgary
DevOps Jobs in Calgary
Engineering Jobs in Calgary
Engineering Manager Jobs in Calgary
Executive Assistant Jobs in Calgary
Finance Jobs in Calgary
Finance Manager Jobs in Calgary
Financial Analyst Jobs in Calgary
Front End Developer Jobs in Calgary
Full Stack Developer Jobs in Calgary
Graphic Design Jobs in Calgary
HR Jobs in Calgary
HR Manager Jobs in Calgary
IT Jobs in Calgary
IT Support Jobs in Calgary
Java Developer Jobs in Calgary
Legal Counsel Jobs in Calgary
Legal Jobs in Calgary
Linux Jobs in Calgary
Machine Learning Jobs in Calgary
Marketing Jobs in Calgary
Marketing Manager Jobs in Calgary
NET Jobs in Calgary
Network Engineer Jobs in Calgary
Operations Jobs in Calgary
Operations Manager Jobs in Calgary
Outside Sales Jobs in Calgary
Payroll Jobs in Calgary
Product Manager Jobs in Calgary
Product Owner Jobs in Calgary
Program Manager Jobs in Calgary
Project Engineer Jobs in Calgary
Project Manager Jobs in Calgary
Python Developer Jobs in Calgary
Quality Assurance Jobs in Calgary
Quality Engineer Jobs in Calgary
Recruiter Jobs in Calgary
Reliability Engineer Jobs in Calgary
Research Jobs in Calgary
Sales Jobs in Calgary
Sales Manager Jobs in Calgary
Sales Rep Jobs in Calgary
SEO Jobs in Calgary
Software Engineer Jobs in Calgary
Software Testing Jobs in Calgary
Staff Accountant Jobs in Calgary
Talent Acquisition Jobs in Calgary
Tax Jobs in Calgary
Technical Support Jobs in Calgary
UX Designer Jobs in Calgary
Web Developer Jobs in Calgary
Writing Jobs in Calgary
All Filters
Total selected ()
No Results
No Results





























