Top Reliability Engineer Jobs in Calgary

16 Days AgoSaved
In-Office or Remote
Calgary, AB
Senior level
Senior level
Artificial Intelligence • Fintech • Information Technology • Logistics • Payments • Business Intelligence • Generative AI
Lead design, automation, and maintenance of cloud-based database infrastructure (primarily SQL Server and MySQL). Improve reliability with monitoring, HA/DR, automation, troubleshooting, on-call support, and mentoring of junior engineers while collaborating across teams.
Top Skills: AuroraAWSBashFailover ClusteringMySQLNew RelicOrchestratorPmmPythonRdsRubySQL ServerVividcortex
Reposted 12 Days AgoSaved
Hybrid
Calgary, AB
Mid level
Mid level
Artificial Intelligence • Big Data • Healthtech • Machine Learning • Analytics • Biotech • Generative AI
The Site Reliability Engineer will manage cloud infrastructure, automate tasks, collaborate in agile teams, and ensure service reliability and quality.
Top Skills: Aurora MysqlAWSAzureBashDockerGCPGoKubernetesPostgresPythonRubyTerraform
Reposted 3 Days AgoSaved
Remote
Calgary, AB
Senior level
Senior level
Artificial Intelligence • Cloud • Consumer Web • Productivity • Software • App development • Data Privacy
The Staff Site Reliability Engineer will develop Dropbox's reliability strategy, enhance operational excellence, and lead cross-team initiatives. Responsibilities include improving monitoring and incident response systems, mentoring engineers, and aligning stakeholders on reliability priorities.
Top Skills: Ai-Enabled Software DeliveryDebugging ToolsDistributed SystemsIncident ResponseObservability
4 Days AgoSaved
Easy Apply
Remote or Hybrid
Calgary, AB
Easy Apply
Senior level
Senior level
eCommerce • Healthtech • Kids + Family • Retail • Social Media
Own and evolve Babylist's AWS infrastructure, Terraform IaC, Kubernetes/EKS clusters, CI/CD, and observability for a platform serving millions. Lead incident response, improve developer tooling, and set reliability standards across engineering teams.
Top Skills: AWSCdnCircleCICloud NetworkingCronitorDatadogDnsEksGithub ActionsKubernetesLoad BalancersMySQLPagerdutyRdsRedisRuby On RailsSentrySidekiqTerraform
Reposted 3 Days AgoSaved
Remote
Calgary, AB
Expert/Leader
Expert/Leader
Information Technology • Software
The Reliability Engineer will develop and implement reliability test plans for IVD medical devices, conduct analyses, and lead cross-functional projects while ensuring compliance with regulatory standards.
Top Skills: JmpMatlabMinitabPythonR
4 Days AgoSaved
In-Office or Remote
Calgary, AB
Senior level
Senior level
Aerospace
Lead reliability strategy and environmental validation for satellite user terminal hardware. Design and run HALT/HASS/ESS tests, oversee weatherproofing and thermal/vibration testing, perform failure analysis (X-ray, microscopy, cross-sectioning), own DFMEA and physics-of-failure models to predict MTBF and warranty risk, collaborate with electrical and mechanical teams and external labs to drive hardware improvements and regulatory compliance.
Top Skills: CfdCross-SectioningDfmeaElectronic Thermal Cycle SystemsEnvironmental ChambersEssFeaHaltHassIp67JmpMicroscopyMinitabPower Delivery Network (Pdn) AnalysisReliasoftSalt Fog Corrosion TestingUv Exposure TestingVibration/Shaker TablesWeibull Life-Data AnalysisX-Ray Inspection
11 Days AgoSaved
Easy Apply
Remote
Calgary, AB
Easy Apply
Mid level
Mid level
Big Data • Fintech • Mobile • Payments • Financial Services
Design and build a centralized reliability platform for production systems, integrating distributed-systems engineering with AI-assisted tools. Implement AI agents for incident triage, log/trace summarization, and developer-facing APIs. Own projects end-to-end and collaborate with product, infra, data, and SRE teams to iterate and improve system health and debuggability.
Top Skills: ClaudeCursorDistributed SystemsGithub CopilotLlmsPython
Reposted 12 Days AgoSaved
In-Office or Remote
Calgary, AB
Senior level
Senior level
Blockchain • eCommerce • Fintech • Payments • Software • Financial Services • Cryptocurrency
The Senior Site Reliability Engineer will enhance reliability of Block's platform, improve incident response using AI tools, and coordinate incident management. Responsibilities include building reliable systems, standardizing tools, and leading high-severity incidents during on-call rotations.
Top Skills: Amazon Web ServicesDatadogDynamoDBGrpcHTTPIstioJavaJSONKotlinKubernetesLaunchdarklyMySQLProtocol BuffersTerraformVitess
Reposted 23 Days AgoSaved
Easy Apply
Remote
Calgary, AB
Easy Apply
Mid level
Mid level
Cloud • Security • Software • Cybersecurity • Automation
As an Intermediate Site Reliability Engineer in Environment Automation, you'll automate operations across many GitLab environments, maintain infrastructure reliability using Kubernetes, and enhance IT practices with Terraform and Ansible, while collaborating with senior engineers.
Top Skills: AnsibleCloud ServicesDevsecopsGitlabGoInfrastructure As CodeKubernetesTerraform
24 Days AgoSaved
Easy Apply
Remote
Calgary, AB
Easy Apply
Senior level
Senior level
Artificial Intelligence • Consumer Web • Digital Media • Information Technology • Social Impact • Software
Lead SRE work to keep Circle highly available and performant: respond to incidents, own monitoring/alerting/log management, manage and optimize MySQL/Postgres/ClickHouse/Redis databases, maintain server infrastructure and deployment pipelines, collaborate with engineering teams, and build internal SRE tooling and automation.
Top Skills: AWSClickhouseKubernetesLlm-Based Tools (Copilots)MySQLPostgresRedis
19 Days AgoSaved
Remote
Calgary, AB
Mid level
Mid level
Information Technology • Software • Database • Automation
Owner of on-prem reliability and escalations: reproduce and resolve L2/L3 issues across heterogeneous Kubernetes environments, build diagnostics and automation, improve CI and e2e test stability, establish performance baselines, harden install/upgrade flows, and write tooling in Python/Go/Rust to reduce repeat incidents.
Top Skills: BenchmarkingCiCi/CdContainersE2E TestingGoHealth ChecksHelmInstallersIntegration TestingKubernetesLoad GenerationLogsMetricsNetworkingObservabilityPackagingProfilingPythonRbacRustStorageSupport BundlesTraces
Reposted 22 Days AgoSaved
Remote
Calgary, AB
Junior
Junior
Insurance
As a Reliability Engineer, you'll design, implement, and maintain AWS cloud environments, ensuring systems' reliability and performance, while enhancing monitoring and incident response capabilities.
Top Skills: AWSNginxPythonUnixWindows
New

Cut your apply time in half.

Use ourAI Assistantto automatically fill your job applications.

Use For Free
Application Tracker Preview
Reposted 22 Days AgoSaved
In-Office or Remote
Calgary, AB
Senior level
Senior level
Artificial Intelligence • Software • Generative AI
The Founding Platform & Reliability Engineer will design and operate reliable, scalable infrastructure for an AI storytelling platform, involving hands-on implementation and strategic decision-making.
Top Skills: AmplitudeAWSCloud RunFirebaseGCPModalNext.JsNode.jsPythonReactRedisSentryTypescriptUpstash
Reposted 5 Hours AgoSaved
Remote
Calgary, AB
Senior level
Senior level
Artificial Intelligence • Cloud • Social Impact • Software • Wearables
Senior SRE focused on building cloud-native platforms, testable automation, and reliability tooling. Partner with Identity and Security to strengthen authentication/authorization, Okta integrations, and compliance. Design tests, write maintainable code (Go/Python), and improve observability and operational practices.
Top Skills: AksApmAWSAzureC#Ci/CdEksGoIacInfrastructure As CodeJavaKubernetesLoggingMetricsObservability ToolsOidcOktaPythonSAMLSecrets ManagementTracing
Reposted YesterdaySaved
Remote
Calgary, AB
Senior level
Senior level
Artificial Intelligence • Fintech • Software • Financial Services
The SRE will own reliability for a cloud-native platform, optimizing performance, availability, and observability, while mentoring engineering teams.
Top Skills: AWSClickhouseGoKafkaKubernetesPulumiPythonTerraform
Reposted 3 Days AgoSaved
In-Office or Remote
Calgary, AB
Senior level
Senior level
Cloud • Software
The Senior Site Reliability / Gitops Engineer will drive automation and collaboration within the IS team, enhancing Canonical's IT operations and services while managing infrastructure as code and cloud technologies.
Top Skills: Cloud ComputingDockerElasticsearchGitopsGrafanaIacKubernetesLinuxPrometheusPython
Reposted 3 Days AgoSaved
In-Office or Remote
Calgary, AB
Mid level
Mid level
Cloud • Software
As a Site Reliability / Gitops Engineer, you will automate operations, develop Infrastructure as Code, maintain core services, and collaborate on service architecture.
Top Skills: Ci/CdCloud ComputingElasticsearchGrafanaInfrastructure As CodeLinuxPrometheusPython
Reposted 3 Days AgoSaved
In-Office or Remote
Calgary, AB
Mid level
Mid level
Cloud • Software
The Site Reliability Engineer will ensure reliable cloud operations by applying Python for infrastructure automation, managing OpenStack and Kubernetes, and practicing devsecops in a fast-paced environment.
Top Skills: KubernetesLinuxOpenstackPython
Reposted 3 Days AgoSaved
In-Office or Remote
Calgary, AB
Senior level
Senior level
Cloud • Software
The Senior Site Reliability Engineer will automate operations using Python, manage Kubernetes and OpenStack clusters, and ensure high availability for enterprise infrastructures.
Top Skills: KubernetesLinuxOpenstackPython
Reposted 4 Days AgoSaved
In-Office or Remote
Calgary, AB
Senior level
Senior level
Artificial Intelligence • Cloud • Information Technology • Software
Design and operate large-scale GPU infrastructure for distributed AI training, ensuring reliability, performance, and efficient customer partnerships.
Top Skills: AnsibleCudaDeepspeedFsdpGpuHelmInfinibandKubernetesLinuxMegatronNcclNvidia A100Nvidia B200Nvidia H100NvlinkPyTorchRoceTerraform
Reposted 4 Days AgoSaved
In-Office or Remote
Calgary, AB
Senior level
Senior level
Artificial Intelligence • Cloud • Information Technology • Software
The Site Reliability Engineer will provision and manage Kubernetes clusters, build automation tools, debug customer issues, and improve infrastructure reliability.
Top Skills: AnsibleBashDatadogGoGrafanaHelmKubernetesLokiPrometheusPythonTerraform
5 Days AgoSaved
Remote
Calgary, AB
Senior level
Senior level
Information Technology • Software
Maintain and automate large-scale bare-metal infrastructure using MAAS/Ironic and Linux; design networks; automate provisioning (PXE/MAAS/Cloud-init) with Ansible/Bash/Python; deploy observability (Prometheus/Grafana, ELK/Graylog/Loki); integrate APIs and manage virtualization platforms like OpenStack, Proxmox, or VMware.
Top Skills: AnsibleBashBiosCloud-InitCloudflare ApiDebianElkGitGrafanaGraylogIpmiIronicKolla-AnsibleL2 RoutingL3 RoutingLinuxLokiMaasOpenstackPreseedPrometheusProxmox VePxePythonRaidUbuntuUnifiVlanVmware EsxiVpn
5 Days AgoSaved
Remote
Calgary, AB
Senior level
Senior level
Information Technology • Software
Maintain and automate large-scale bare-metal infrastructure using MAAS/Ironic and Linux. Design networks (VLANs, L2/L3, VPNs), provision servers (PXE/Preseed/Cloud-init), build observability (Prometheus/Grafana, ELK/Graylog/Loki), integrate APIs, and operate virtualization platforms (OpenStack/Proxmox/VMware).
Top Skills: AnsibleBashBiosCloud-InitCloudflare ApiContainer OrchestrationDebianElkGitGitopsGrafanaGraylogIpmiIronicKolla-AnsibleL2 RoutingL3 RoutingLinuxLokiMaasOpenstackPreseedPrometheusProxmox VePxePythonRaidUbuntuUnifiVlansVmware EsxiVpns
5 Days AgoSaved
In-Office or Remote
Calgary, AB
Senior level
Senior level
Cloud • Security • Software • Cybersecurity
Lead reliability for global load-balancing infrastructure: build observability pipelines, define SLO/SLIs, lead incident response, automate deployment safety, review designs, and develop tooling (Python/Go) and IaC to reduce operational toil.
Top Skills: AnsibleContainerizationFeature FlagsGoKubernetesL4 Load BalancingL7 Load BalancingLinuxNb/NlbNetstatObservabilityPythonSaltstackTcpdumpTerraform
Reposted 5 Days AgoSaved
Remote
Calgary, AB
Senior level
Senior level
Artificial Intelligence • Information Technology • Software • Database
As a Site Reliability Engineer, you will design, implement, and maintain scalable infrastructure, ensure system reliability, automate processes, and collaborate with engineering teams.
Top Skills: DockerElk StackGoGrafanaJavaKubernetesNode.jsPrometheusPulumiPythonRubyTerraform
All Filters
JobType
New Jobs
Job Category
Experience
Industry
Company Name
Company Size

Sign up now Access later

Create Free Account