ScalePad Jobs

Senior Site Reliability Engineer

ScalePad

Senior Site Reliability Engineer

Posted 22 Days Ago

In-Office or Remote

Hiring Remotely in Vancouver, BC

Senior level

In-Office or Remote

Hiring Remotely in Vancouver, BC

Senior level

The Senior Site Reliability Engineer will manage cloud infrastructure, enhance developer productivity, improve operational reliability, and mentor engineering teams.

The summary above was generated by AI

Who We’re Looking For

At ScalePad, we hire thoughtful builders who want their work to matter. Our roles are designed for people who thrive on driving impact, see ambiguity as an opportunity, and believe that raising the bar is a team sport.

We don’t bring people in to run playbooks. We hire people who want to rewrite them. And in this role, you’ll get to do that, while shaping the future of managed services for our global partners. (That’s what we call our customers.)

What is ScalePad

At ScalePad, we’re building more than software; we’re building confidence and clarity for the people who manage the technology businesses rely on every day.

Our mission: help MSPs evolve into MVPs (their clients’ most valuable partner). Our tools turn them from reactive service providers into strategic advisors through a consistent, scalable Customer Success motion.
Our product suite unifies risk insights, client planning, and service delivery so MSPs can have smarter conversations, show clients their value, and grow their revenue.

But our purpose goes beyond our software. We’re creating a workplace where curious, growth-minded people can do their best work, where ideas are valued, progress is shared, and everyone belongs. Together, we’re creating a future where MSPs don’t just keep businesses running, they help them thrive. We believe that when our partners succeed, we all do.

With offices in Vancouver, Toronto, Montreal, and Phoenix and a global-first mindset. ScalePad has grown into a category leader trusted by 12,000+ partners across 60+ countries. We’ve been recognized for our products and corporate culture by MSP Today, G2, and Great Place to Work™, to name a few.

About the role

We’re looking for a Senior Site Reliability Engineer (SRE) to help strengthen and scale our multi-cloud platform and developer experience. This is a hands-on senior individual contributor role for an engineer who enjoys solving complex infrastructure challenges, improving reliability, and helping teams ship and operate software more effectively.

You’ll work closely with engineering leadership and alongside SREs across product domains. Reliability, infrastructure as code, internal tooling, and developer productivity will all be part of your day-to-day focus. You’ll spend your time building, operating, and improving the systems that engineering teams rely on while contributing to best practices and operational excellence across the organization.

What you’ll do

Get ready to go beyond order-taking. Your strategic responsibilities include:

Platform and Infrastructure

Operate production infrastructure across AWS and Azure, including networking, IAM, and cost
Build and operate Terraform modules and state at scale, keeping our infrastructure as code clean and reviewable
Run Kubernetes in production: upgrades, scaling, troubleshooting, and platform improvements
Operate and improve CI/CD pipelines that the entire engineering org depends on

Reliability & Operational Excellence

Operationalize SLO/SLI frameworks and observability practices alongside the SRE team
Drive incident response practice, on-call tooling, and incident review follow-through
Reduce operational toil through automation across secret rotation, access management, and environment provisioning
Contribute to capacity planning, disaster recovery, and resilience work across critical systems

Developer Experience & Technical Influence

Build and maintain internal developer tooling that removes friction across engineering
Lead rollouts of AI-native tooling for code review, testing, and engineering productivity, e.g., CodeRabbit, Copilot-class assistants, and internal AI workflows
Own migrations and consolidation of internal platforms such as Jira, Confluence, ticketing, and documentation systems
Partner with engineering and product leadership to identify and remove the biggest DX bottlenecks, and align infrastructure and reliability investments with business goals
Mentor engineers and technical leads, fostering growth and knowledge-sharing within the organization
Lead post-mortems and continuous improvement initiatives to strengthen reliability practices

Innovation & Continuous Improvement

Evaluate and introduce new technologies, tools, and approaches to improve scalability and efficiency
Drive standardization and modernization efforts across infrastructure and operational practices
Lead proof-of-concept and experimentation initiatives to validate new reliability solutions

What we’re looking for

We care about what you can do more than where you’ve done it. However, experience in the following areas will help you hit the ground running in this role:

Must-haves

5+ years of experience in software engineering, infrastructure, or related technical disciplines, with a focus on Site Reliability Engineering (SRE), DevOps, Platform Engineering, or similar roles.
Strong expertise in cloud infrastructure, distributed systems, networking, and observability practices
Experience designing and operating highly available, scalable production systems
Deep understanding of scripting, automation, infrastructure as code, CI/CD, and operational best practices
Experience implementing SLO/SLI frameworks and reliability engineering methodologies
Incident management, troubleshooting, and on-call experience in complex production environments
Passion for mentoring engineers and improving engineering culture

Nice to Have

Experience rolling out AI tooling in an engineering organization
Experience leading tooling and platform migrations such as Jira, Confluence, or observability stacks
Experience with chaos engineering practices and reliability testing
Experience optimizing large-scale cloud infrastructure costs

Perks

ScalePad offers our employees a blend of purpose, growth, and genuinely great perks.

Everyone’s an owner. Share in our success through our Employee Stock Ownership Plan (ESOP) and RRSP matching.
Support for growing families. Parental leave programs are in place to support you and your family when it matters most.
Structured mentorship with builders. Join opt-in mentorship programs and learn directly from founders and senior leaders who’ve scaled multiple SaaS ventures and spent decades in the MSP industry.
Invest in your growth every year. Access an annual professional development budget to level up your skills, your career, and your impact.
Set yourself up with great tools. Work with brand new, top-of-the-line hardware and equipment so you can do your best work, whether you’re at home or in one of our hubs.
Modern ways of working. Roles at ScalePad are structured as remote or hybrid, with hub locations in Vancouver, Toronto, Montreal, and Phoenix. Specific work models are outlined in each posting.
Support for hybrid life. Receive a monthly stipend to help you create an effective hybrid or remote work environment.
Well-being and time to recharge. Take care of yourself with 100% employer-paid benefits.

Before You Apply

This is a full-time role for those who are eligible to work in Canada. We thank all applicants for taking the time to apply, but only candidates who make it to the next stage will be contacted.

Note on AI Use: ScalePad uses AI technology to support certain administrative aspects of our hiring process, such as transcription, note-taking, and interview documentation. These tools are strictly used to assist our team and have no influence on candidate evaluation or hiring decisions.

No recruiters, please.

Similar Jobs

Circle (circle.so)

Senior Site Reliability Engineer

3 Days Ago

Easy Apply

Remote

Canada

Easy Apply

Senior level

Artificial Intelligence • Consumer Web • Digital Media • Information Technology • Social Impact • Software

Lead SRE work to keep Circle highly available and performant: respond to incidents, own monitoring/alerting/log management, manage and optimize MySQL/Postgres/ClickHouse/Redis databases, maintain server infrastructure and deployment pipelines, collaborate with engineering teams, and build internal SRE tooling and automation.

Top Skills: AWSClickhouseKubernetesLlm-Based Tools (Copilots)MySQLPostgresRedis

Elastic

Senior Site Reliability Engineer

4 Days Ago

Remote

Canada

Senior level

Cloud • Security • Software • Generative AI

Lead engineering initiatives to automate and scale Elastic's multi-cloud platform. Build and maintain software, tooling, and automations for reliability; manage Kubernetes at scale; respond to major incidents and drive problem management; collaborate across distributed teams and participate in a follow-the-sun on-call rotation to prevent customer impact.

Top Skills: CrossplaneDockerElastic CloudElastic StackGoInfluxdbInfrastructure-As-CodeKubernetesLinuxPrometheusServerlessTerraform

Magnet Forensics

Senior Site Reliability Engineer

5 Days Ago

Remote

Canada

Senior level

Software

Operate and maintain production AWS/EKS Kubernetes clusters; design and ship infrastructure-as-code with Terraform; manage Helm charts and ArgoCD GitOps for multi-region SaaS; maintain observability (Grafana, alerting, logs); improve CI/CD pipelines; remediate container and infrastructure CVEs; support compliance (FedRAMP/SOC2/NIST); create runbooks and lead incident response and post-incident reviews.

Top Skills: Amazon EksArgocdAWSCi/CdClaudeDockerGitopsGrafanaHelmKubernetesTerraform

What you need to know about the Calgary Tech Scene

Employees can spend up to one-third of their life at work, so choosing the right company is crucial, not just for the job itself but for the company culture as well. While startups often offer dynamic culture and growth opportunities, large corporations provide benefits like career development and networking, especially appealing to recent graduates. Fortunately, Calgary stands out as a hub for both, recognized as one of Startup Genome's Top 100 Emerging Ecosystems, while also playing host to a number of multinational enterprises. In Calgary, job seekers can find a wide range of opportunities.