Top Reliability Engineer Jobs in Calgary

Reposted 13 Days AgoSaved
In-Office or Remote
Calgary, AB
Expert/Leader
Expert/Leader
Artificial Intelligence • Software
As a Software Engineer in Reliability, you'll architect and manage multi-cloud GPU infrastructure, ensuring performance, security, and scale while debugging complex hardware/software issues.
Top Skills: AmdAWSBashGoGpuInfinibandLinuxNvidiaOciPythonRdma
15 Days AgoSaved
Remote
Calgary, AB
Senior level
Senior level
Blockchain • Financial Services • Cryptocurrency • Web3
Design, build, and operate the infrastructure and platform services that power AI agent workflows in production. Ensure reliability, scalability, observability, and security of model-serving, orchestration, and compute layers. Build APIs, SDKs, IaC (Terraform), CI/CD pipelines, monitoring and incident-response processes, and collaborate with AI, data, and engineering teams to productionize agent prototypes.
Top Skills: AWSBashCi/CdDockerKubernetesPythonTerraform
16 Days AgoSaved
In-Office or Remote
Calgary, AB
Expert/Leader
Expert/Leader
Agency • Information Technology • Professional Services • Software
Lead development and implementation of preventive and predictive maintenance for onshore mechanical equipment, use CMMS to plan and monitor maintenance, analyze reliability data, perform RCA, support operations and maintenance teams, ensure safety and compliance, and recommend improvements to reduce downtime and costs.
Top Skills: CmmsPredictive MaintenancePreventive MaintenanceRoot Cause Analysis
16 Days AgoSaved
In-Office or Remote
Calgary, AB
Expert/Leader
Expert/Leader
Agency • Information Technology • Professional Services • Software
Lead development and implementation of preventive and predictive maintenance programs for offshore mechanical equipment, use CMMS to plan and track work, perform RCA for failures, support offshore teams in troubleshooting, monitor equipment reliability, and ensure compliance with safety and maintenance standards.
Top Skills: CmmsPredictive MaintenancePreventive MaintenanceRoot Cause Analysis
Reposted 16 Days AgoSaved
Remote
Calgary, AB
Senior level
Senior level
Cloud • Software
Operate, maintain and improve the global Tyk Cloud platform: run production Kubernetes clusters, manage cloud infrastructure, automate operations, run on-call incident response, create monitoring and dashboards, conduct post-incident analysis, document SRE processes, and drive reliability, efficiency and multi-region/multi-cloud expansion.
Top Skills: AWSAzureContainersDnsEksGCPGoGrafanaHelmHTTPInfrastructure As Code (Iac)KubernetesLinuxLogging Collection And Analysis SystemsMongoDBPrometheusPythonRancherRedisTcp/IpTerraformThanosTlsUdp
Reposted 17 Days AgoSaved
Remote
Calgary, AB
Senior level
Senior level
Artificial Intelligence • Cloud • Social Impact • Software • Wearables
Senior SRE focused on building cloud-native platforms, testable automation, and reliability tooling. Partner with Identity and Security to strengthen authentication/authorization, Okta integrations, and compliance. Design tests, write maintainable code (Go/Python), and improve observability and operational practices.
Top Skills: AksApmAWSAzureC#Ci/CdEksGoIacInfrastructure As CodeJavaKubernetesLoggingMetricsObservability ToolsOidcOktaPythonSAMLSecrets ManagementTracing
Reposted 18 Days AgoSaved
Remote
Calgary, AB
Senior level
Senior level
Artificial Intelligence • Fintech • Software • Financial Services
The SRE will own reliability for a cloud-native platform, optimizing performance, availability, and observability, while mentoring engineering teams.
Top Skills: AWSClickhouseGoKafkaKubernetesPulumiPythonTerraform
New

Cut your apply time in half.

Use ourAI Assistantto automatically fill your job applications.

Use For Free
Application Tracker Preview
Reposted 20 Days AgoSaved
In-Office or Remote
Calgary, AB
Senior level
Senior level
Cloud • Software
The Senior Site Reliability / Gitops Engineer will drive automation and collaboration within the IS team, enhancing Canonical's IT operations and services while managing infrastructure as code and cloud technologies.
Top Skills: Cloud ComputingDockerElasticsearchGitopsGrafanaIacKubernetesLinuxPrometheusPython
Reposted 20 Days AgoSaved
In-Office or Remote
Calgary, AB
Mid level
Mid level
Cloud • Software
As a Site Reliability / Gitops Engineer, you will automate operations, develop Infrastructure as Code, maintain core services, and collaborate on service architecture.
Top Skills: Ci/CdCloud ComputingElasticsearchGrafanaInfrastructure As CodeLinuxPrometheusPython
Reposted 20 Days AgoSaved
In-Office or Remote
Calgary, AB
Mid level
Mid level
Cloud • Software
The Site Reliability Engineer will ensure reliable cloud operations by applying Python for infrastructure automation, managing OpenStack and Kubernetes, and practicing devsecops in a fast-paced environment.
Top Skills: KubernetesLinuxOpenstackPython
Reposted 20 Days AgoSaved
In-Office or Remote
Calgary, AB
Senior level
Senior level
Cloud • Software
The Senior Site Reliability Engineer will automate operations using Python, manage Kubernetes and OpenStack clusters, and ensure high availability for enterprise infrastructures.
Top Skills: KubernetesLinuxOpenstackPython
23 Days AgoSaved
Remote
Calgary, AB
Senior level
Senior level
Artificial Intelligence • Information Technology • Software • Database
As a Site Reliability Engineer, you will design, implement, and maintain scalable infrastructure, ensure system reliability, automate processes, and collaborate with engineering teams.
Top Skills: DockerElk StackGoGrafanaJavaKubernetesNode.jsPrometheusPulumiPythonRubyTerraform
Reposted 23 Days AgoSaved
In-Office or Remote
Calgary, AB
Senior level
Senior level
Artificial Intelligence • Cloud • Information Technology • Software
Design and operate large-scale GPU infrastructure for distributed AI training, ensuring reliability, performance, and efficient customer partnerships.
Top Skills: AnsibleCudaDeepspeedFsdpGpuHelmInfinibandKubernetesLinuxMegatronNcclNvidia A100Nvidia B200Nvidia H100NvlinkPyTorchRoceTerraform
Reposted 23 Days AgoSaved
In-Office or Remote
Calgary, AB
Senior level
Senior level
Artificial Intelligence • Cloud • Information Technology • Software
The Site Reliability Engineer will provision and manage Kubernetes clusters, build automation tools, debug customer issues, and improve infrastructure reliability.
Top Skills: AnsibleBashDatadogGoGrafanaHelmKubernetesLokiPrometheusPythonTerraform
Reposted 24 Days AgoSaved
In-Office or Remote
Calgary, AB
Senior level
Senior level
Software
The Senior Site Reliability Engineer will lead service onboarding, maintain SLAs/SLOs, design secure infrastructure, automate operational tasks, and respond to incidents while ensuring system reliability and performance.
Top Skills: AWSCloudFormationElk StackGoGrafanaHadoopKubernetesPythonTerraform
All Filters
JobType
New Jobs
Job Category
Experience
Industry
Company Name
Company Size

Sign up now Access later

Create Free Account