The Senior Site Reliability Engineer will manage cloud infrastructure, improve reliability, enhance security, and collaborate with teams to design reliable systems.
Your Career, our Future—Together.
What You'll Do
What You'll Bring
Workplace & Compensation
Let's Start the Conversation
Ready to join something big? At SoundHound AI, we bring voice, generative, and conversational AI together to transform how people interact with products and services. From voice-enabled vehicles to food ordering and customer support, our multilingual, omnichannel technology already impacts hundreds of millions worldwide.
The OpportunityThis is a high-ownership role with direct influence over infrastructure decisions. The team has a clear roadmap focused on improving reliability, security posture, and operational maturity. The Senior Site Reliability Engineer helps build first-class infrastructure to deliver our best-in-class technology to the world. The infrastructure is large and complex, running in the cloud and on Kubernetes, so there's no shortage of interesting problems.
- Build software and systems for cloud infrastructure management and automation (Terraform, Ansible, Oracle Cloud, GCP)
- Participate in developing frameworks for application deployment, customization, and upgrades (Kubernetes, ArgoCD, Vault, Jenkins)
- Ensure application and infrastructure security complies with ISO 27001 / SOX / PCI
- Improve observability, implement and measure key metrics, and define and enforce SLOs/SLAs (Prometheus, Grafana, ELK)
- Collaborate with engineering, quality engineering, and product management to architect and build highly available, reliable, and secure systems
- 8 years of experience working with cloud services at scale in a high-volume customer-facing environment with a Bachelor's degree in Computer Science or equivalent
- Willing to participate in on-call rotation
- Vast experience working in Linux environments, security, and networking with Python, Go, or Bash
- Very experienced with monitoring and alerting tools such as Prometheus, Grafana, ELK stack, and PagerDuty
- Experience with deployments in cloud technologies and architectures, CI/CD tools, and configuration management such as Ansible, Terraform, and Kubernetes
- Proficient with a wide range of relevant server-side technologies such as Consul, Vault, Kafka, MongoDB, PostgreSQL, MySQL
- Pragmatic, problem-solving approach when designing and implementing solutions
This role is available throughout Canada. Employees within a 100-kilometer radius of our Toronto office are expected to work from the office on three pre-scheduled “core days” each month to encourage cross-team connection and in-person collaboration.
Compensation includes salary, equity, comprehensive healthcare, paid time off, and other benefits. Our recruiting team will provide a specific salary range based on location and years of experience.
#LI-MQ1 #LI-REMOTE
Join SoundHound AI and collaborate with colleagues worldwide who are shaping the future of voice AI. Guided by our values—supportive, open, undaunted, nimble, and determined to win—we strive to build breakthrough AI experiences together.
We provide reasonable accommodations for individuals with disabilities throughout the hiring process and employment. To view our job applicant privacy policy, please visit https://static.soundhound.com/corpus/ta/applicantprivacynotice.html.
Discover more about our philosophy, benefits, and culture at https://www.soundhound.com/careers.
***Please beware of agency recruiters falsely stating that they represent SoundHound AI on job posts. Our job post above will note if we are utilizing a specific agency to assist with the search. Our recruiters use @soundhound.com email addresses exclusively.
Similar Jobs
Cloud • Insurance • Payments • Software • Business Intelligence • App development • Big Data Analytics
As a Senior Site Reliability Engineer, you will ensure software reliability and scalability, manage IAC, CI/CD, monitor systems, and mentor junior engineers while collaborating across teams.
Top Skills:
AnsibleArgocdBashDatadogGithub ActionsGitlabGoHashicorp ConsulHelmKubernetesPackerPostgresPowershellPythonSQL ServerTerraformTypescript
Blockchain • eCommerce • Fintech • Payments • Software • Financial Services • Cryptocurrency
The Senior Site Reliability Engineer will enhance reliability of Block's platform, improve incident response using AI tools, and coordinate incident management. Responsibilities include building reliable systems, standardizing tools, and leading high-severity incidents during on-call rotations.
Top Skills:
Amazon Web ServicesDatadogDynamoDBGrpcHTTPIstioJavaJSONKotlinKubernetesLaunchdarklyMySQLProtocol BuffersTerraformVitess
Artificial Intelligence • Consumer Web • Digital Media • Information Technology • Social Impact • Software
The Senior Site Reliability Engineer will manage system incidents, improve monitoring and logging, optimize database infrastructure, and collaborate on scaling systems efficiently.
Top Skills:
AWSClickhouseKubernetesMySQLPostgresRedis
What you need to know about the Calgary Tech Scene
Employees can spend up to one-third of their life at work, so choosing the right company is crucial, not just for the job itself but for the company culture as well. While startups often offer dynamic culture and growth opportunities, large corporations provide benefits like career development and networking, especially appealing to recent graduates. Fortunately, Calgary stands out as a hub for both, recognized as one of Startup Genome's Top 100 Emerging Ecosystems, while also playing host to a number of multinational enterprises. In Calgary, job seekers can find a wide range of opportunities.

.png)

