Arena (arena.ai) Logo

Arena (arena.ai)

Site Reliability Engineer, Platform

Reposted 3 Days Ago
Remote or Hybrid
Hiring Remotely in CA
Senior level
Remote or Hybrid
Hiring Remotely in CA
Senior level
The role involves defining and evolving technical foundations for AI evaluation, optimizing performance, designing resilient systems, and collaborating with various teams for infrastructure improvements.
The summary above was generated by AI
About Arena Intelligence

Arena Intelligence is the open platform for evaluating how AI models perform in the real world. Created by researchers from UC Berkeley’s SkyLab, our mission is to measure and advance the frontier of AI for real-world use.

Millions of people use Arena Intelligence each month to explore how frontier systems perform — and we use our community’s feedback to build transparent, rigorous, and human-centered model evaluations. Leading enterprises and AI labs rely on our evaluations to understand real-world reliability, alignment, and impact. Our leaderboards are the gold standard for AI performance — trusted by leaders across the AI community and shaping the global conversation on model reliability and progress.

We’re a team of researchers, engineers, academics, and builders from places like UC Berkeley, Google, Stanford, DeepMind, and Discord. We seek truth, move fast, and value craftsmanship, curiosity, and impact over hierarchy. We’re building a company where thoughtful, curious people from all backgrounds can do their best work. Everyone on our team is a deep expert in their field — our office radiates excellence, energy, and focus.

About the Role

Arena Intelligence is seeking a Site Reliability Engineer to own the reliability, performance, and operational security of the platform that millions of people depend on to evaluate frontier AI. This is the first dedicated SRE hire on the team — you'll build observability, incident response, and infrastructure hardening practices from scratch while also owning the CI/CD and developer tooling that keeps our engineering team moving fast.

Our stack runs on Vercel (Next.js, Hono API on Nitro), Supabase (Postgres, GoTrue auth), Cloudflare (Workers, R2, bot management), and AWS (CloudFront, Lambda). You'll work across the full request path — from edge-layer DDoS mitigation to auth hardening to production monitoring — partnering closely with security and product engineering to keep the platform fast, reliable, and resilient under adversarial traffic conditions.

You’ll
  • Harden auth infrastructure against volumetric attacks — edge-layer rate limiting in front of Supabase GoTrue, connection pool tuning, token caching, and origin shielding so DDoS traffic is filtered before it reaches the database

  • Extend CloudFront WAF rules and Cloudflare Worker bot management to cover auth endpoints and close gaps in application-layer rate limiting

  • Define and implement SLOs/SLIs across the full request path — CDN edge through serverless functions to Supabase

  • Build monitoring, alerting, and dashboards on top of existing Datadog and PostHog instrumentation that surface degradations before users notice them

  • Collaborate with security engineering to ensure clean handoff between edge-layer defenses and application-layer anti-abuse systems

  • Own and improve CI/CD pipelines (GitHub Actions, Turborepo) and expand infrastructure-as-code (Terraform) across cloud environments

  • Proactively load-test and stress-test infrastructure, model capacity limits, and drive cost optimization across our multi-cloud footprint

  • Enhance developer workflows to make building, testing, and deploying faster and more reliable

  • Mentor engineers across the company on building reliable, performant, and observable systems

You’ll have
  • 6+ years of experience in SRE, platform engineering, or infrastructure engineering, including operating production systems at scale (millions of users / billions of requests)

  • Direct experience mitigating DDoS attacks and configuring edge security — WAF rules, CDN architecture, rate limiting, and traffic analysis

  • Hands-on experience building observability systems (Datadog, Grafana, Prometheus, or similar) and running incident response processes

  • Strong understanding of auth infrastructure under adversarial load — connection pooling, token caching, and rate limiting on login/signup endpoints

  • Experience with serverless architectures and managed platforms — you know how to make them reliable and observable at scale

  • Experience with infrastructure-as-code (Terraform, Pulumi) and CI/CD pipeline design

  • Track record of collaborating with security and product engineering to deliver both foundational systems and user-facing reliability improvements

Bonus Experience
  • Experience with Vercel, Supabase (GoTrue, Supavisor), Cloudflare Workers, or CloudFront specifically.

  • Experience with Node.js, TypeScript, Python, or Go in production backend environments.

  • Background in platforms with voting, reputation, or community-driven systems.

  • Experience being the first or early infrastructure hire at a startup.

  • Experience hardening auth systems under load (OAuth, JWT, PKCE flows, connection pooling).

What we offer
  • We offer competitive compensation and equity aligned to the markets where our team members are based. The base salary range will depend on the candidate’s permanent work location.

  • Comprehensive health and wellness benefits, including medical, dental, vision, and additional support programs.

  • The opportunity to work on cutting-edge AI with a small, mission-driven team

  • A culture that values transparency, trust, and community impact

Come help build the space where anyone can explore and help shape the future of AI.

Arena Intelligence provides equal employment opportunities (EEO) to all employees and applicants for employment without regard to race, color, religion, sex, national origin, age, disability, genetics, sexual orientation, gender identity, or gender expression. We are committed to a diverse and inclusive workforce and welcome people from all backgrounds, experiences, perspectives, and abilities.

Similar Jobs

8 Days Ago
Remote
Canada
Senior level
Senior level
Information Technology • Software • Travel • Hospitality
As a Senior Site Reliability Engineer, you will ensure platform reliability and performance, architect AWS solutions, maintain Kubernetes clusters, support CI/CD processes, automate platform deployments, and optimize system performance.
Top Skills: ArgocdAuroraAWSCloudwatchDatadogGrafanaKubernetesMemcachedMySQLNginxPostgresPrometheusRedisSqsTerraform
8 Days Ago
Remote
Canada
Senior level
Senior level
Information Technology • Software • Travel • Hospitality
As a Senior Site Reliability Engineer, ensure platform reliability, implement AWS solutions, support Kubernetes, and automate deployments while optimizing performance.
Top Skills: ArgocdAuroraAWSCloudwatchDatadogGitopsGrafanaKubernetesMemcachedMySQLNginxPostgresPrometheusRedisSqsTerraform
11 Days Ago
Remote
Canada
Senior level
Senior level
Information Technology • Software • Travel • Hospitality
As a Senior Site Reliability Engineer, you will ensure platform reliability, implement AWS cloud solutions, support Kubernetes infrastructure, and enhance observability systems while collaborating with global teams.
Top Skills: ArgocdAuroraAWSCloudwatchDatadogEksGitopsGrafanaKubernetesMemcachedMySQLNginxPostgresPrometheusRedisSqsTerraform

What you need to know about the Calgary Tech Scene

Employees can spend up to one-third of their life at work, so choosing the right company is crucial, not just for the job itself but for the company culture as well. While startups often offer dynamic culture and growth opportunities, large corporations provide benefits like career development and networking, especially appealing to recent graduates. Fortunately, Calgary stands out as a hub for both, recognized as one of Startup Genome's Top 100 Emerging Ecosystems, while also playing host to a number of multinational enterprises. In Calgary, job seekers can find a wide range of opportunities.

Sign up now Access later

Create Free Account

Please log in or sign up to report this job.

Create Free Account