Site Reliability Engineer - Platform Engineering

Sorry, this job was removed at 06:34 p.m. (MST) on Tuesday, May 06, 2025

In-Office

7 Locations

In-Office

7 Locations

Similar Jobs

Kraft Heinz

Finance Manager

14 Minutes Ago

Hybrid

Toronto, ON, CAN

Senior level

Big Data • Cloud • Food • Machine Learning • Software • Database • Analytics

The Senior Finance Manager oversees category financials, manages reporting, provides strategic recommendations, influences P&L decisions, and enhances financial process improvements.

Top Skills: ExcelMs PowerpointSAPTableau

Stream

Staff Software Engineer

17 Minutes Ago

In-Office

Toronto, ON, CAN

Senior level

Cloud • Machine Learning • Other • Software

The role requires backend engineering experience, proficiency in Go or another language, and skills in scalability, databases, and SDK development for high-scale features.

Top Skills: AWSCockroachdbElkGoGrafanaGraphiteGrpcJaegerMemcachedPostgresPythonRabbitMQRedisRocksdbRust

FOSSA

Senior Software Engineer

19 Minutes Ago

In-Office or Remote

Calgary, AB, CAN

Mid level

Information Technology • Security • Software • Cybersecurity • Data Privacy

As a Senior Software Engineer, you'll design, build, and refine core product features, owning user-impacting projects from conception to completion, working in various programming languages, and collaborating with teams across the company.

Top Skills: GoHaskellRustSQLTypescript

Job Title: Site Reliability Engineer (SRE), Platform Engineering

About Us: WitnessAI is a leader in providing innovative networking solutions designed to enhance security, performance, and reliability for businesses of all sizes. We are seeking a highly skilled Site Reliability Engineer (SRE) with a strong background in Linux administration, AWS, and Kubernetes for our Platform Engineering team. The ideal candidate will help ensure the reliability, scalability, and performance of our systems while driving a culture of automation and continuous improvement.

Key Responsibilities

System Reliability & Operations

Maintain and improve the reliability, availability, and performance of our services and infrastructure.
Monitor system health, troubleshoot issues, and respond to incidents with a focus on reducing mean time to recovery (MTTR).

Infrastructure Management

Administer and optimize Linux-based systems across development, staging, and production environments.
Design and manage scalable, secure, and cost-effective solutions on AWS.
Build, maintain, and monitor Kubernetes clusters to support containerized applications.

Automation & Tooling

Develop and maintain CI/CD pipelines to streamline deployments.
Automate operational tasks using tools such as Terraform, Crossplane, or custom scripts.
Create and enhance monitoring, alerting, and logging systems to improve observability.
Build ad-hoc, reusable automation solutions where required.

Collaboration & Best Practices

Partner with engineering teams to integrate SRE principles into the software development lifecycle.
Advocate for best practices in incident response, post-mortem reviews, and capacity planning.
Share knowledge with team members and contribute to a culture of continuous improvement.

Security & Compliance

Implement security best practices for cloud and containerized environments.
Ensure compliance with organizational and industry standards.

Requirements

Technical Skills

Proven expertise in Linux system administration (e.g., Ubuntu, CentOS, or similar).
Deep understanding of AWS services and architecture (e.g., EC2, S3, RDS, VPC, IAM).
Strong experience managing Kubernetes clusters in production.
Hands-on experience with infrastructure-as-code tools like Terraform or CloudFormation
Proficiency in scripting or programming languages (e.g., Python, Bash, or Go).
Demonstrated experience in app development for ba lend automation solutions.
3+ years of experience in a Site Reliability Engineer, DevOps Engineer, or similar role working for a SaaS or Cloud bases company.

Operational Expertise

Familiarity with monitoring and logging tools such as Prometheus, Grafana, ELK, or Datadog
Experience designing and maintaining CI/CD pipelines (e.g., Jenkins, GitLab CI, or CircleCI).
Understanding of networking concepts (e.g., DNS, load balancing, firewalls).

Problem Solving & Collaboration

Strong analytical and troubleshooting skills.
Ability to work effectively in a collaborative, team-oriented environment.
Excellent written and verbal communication skills.

Education

Bachelor’s degree in Computer Science, Engineering, or equivalent work experience.

Nice-to-Have Skills:

Experience with service meshes and other CNCF technologies (e.g., Istio or Linkerd).
Knowledge of database systems (e.g., MySQL, PostgreSQL, or NoSQL databases).
Familiarity with cloud-native technologies and tools (e.g., Helm, ArgoCD, Spinnaker).

Benefits:

Hybrid work environment
Competitive salary.
Health, dental, and vision insurance.
401(k) plan.
Opportunities for professional development and growth.
Generous vacation policy.

Salary range:

$170,000-$200,000

What you need to know about the Calgary Tech Scene

Employees can spend up to one-third of their life at work, so choosing the right company is crucial, not just for the job itself but for the company culture as well. While startups often offer dynamic culture and growth opportunities, large corporations provide benefits like career development and networking, especially appealing to recent graduates. Fortunately, Calgary stands out as a hub for both, recognized as one of Startup Genome's Top 100 Emerging Ecosystems, while also playing host to a number of multinational enterprises. In Calgary, job seekers can find a wide range of opportunities.