Lead global compute capacity and platform strategy for training and inference: plan multi-year capacity, manage vendor/cloud partnerships, direct infrastructure and datacenter teams, optimize cluster efficiency (>50% MFU), oversee large capital deployments, and serve as executive liaison to silicon vendors and hyperscalers to enable world-model and robotics workloads.
The Role
Compute is the ultimate physical and financial prerequisite for the robotics foundation models we are building. This role owns Luma’s global compute footprint end-to-end—bridging macro capacity strategy, multi-million dollar capital allocation, and top-tier systems architecture. You will design our scaling roadmap from the silicon up, ensuring our research and robotics teams have the uninterrupted runway they need to ship frontier world models. As a member of the executive team, you will be the single person responsible for turning capital into capability.
What You'll Do
- Architect Multi-Year Compute Strategy: Lead capacity planning, global vendor and cloud partnerships, on-prem vs. cloud mix, and accelerator supply chain roadmaps (H/B-series GPUs, custom silicon evaluation).
- Direct the Platform Org: Provide strategic leadership to our infrastructure, distributed systems, and datacenter operations teams—scaling the organization to support next-generation compute demands.
- Maximize Fleet Utilization: Oversee the architectural efficiency of our cluster configurations to deliver >50% Model Flops Utilization (MFU) on flagship training runs.
- Command a Megawatt Budget: Negotiate, secure, and operate our largest-scale capital deployments for compute infrastructure, partnering directly with Finance to optimize unit economics and risk management.
- Unify Global Capacity: Champion the platform strategy that enables world-model training, heavy simulation rollouts, and real-time on-robot inference to seamlessly share a single, elastic fleet.
- Act as Principal Executive Interface: Serve as the primary commercial and strategic bridge to NVIDIA, AMD, hyperscalers, and frontier silicon vendors.
Qualifications:
- 10+ years of engineering leadership experience in large-scale distributed systems, infrastructure, or technical supply chain, with a proven track record of leading compute platform strategy at a frontier AI lab, hyperscaler, or major autonomy program.
- Deep technical & commercial fluency in high-performance cluster topology, high-speed interconnects (InfiniBand/RoCE), large-scale data systems, and the economics of distributed training architectures.
- Direct operational oversight of 10k+ accelerator environments in high-performance production settings.
Preferred qualifications:
- Scale Credentials: Experience orchestrating capital or infrastructure for training runs at the >100B-parameter or >100k-GPU-day scale.
- Robotics/Autonomy Context: Familiarity with the unique capacity and latency demands of edge-to-cloud inference and real-time autonomous systems.
The base pay range for this role is $250,000 – $450,000 per year.
About LumaLuma’s mission is to build unified general intelligence that can generate, understand, and operate in the physical world.
We believe that multimodality is critical for intelligence. To go beyond language models and build more aware, capable and useful systems, the next step function change will come from vision. So, we are working on training and scaling up multimodal foundation models for systems that can see and understand, show and explain, and eventually interact with our world to effect change.
Similar Jobs
Productivity • Software • App development • Automation
Run pipeline, lifecycle, and demand programs to drive multi-seat B2B SaaS conversions. Build and execute full-funnel campaigns, manage HubSpot workflows and reporting, partner with sales on account targeting, and run customer advocacy, review-generation, and content initiatives to grow pipeline and bookings.
Top Skills:
Ai ToolsAutomation PlatformsCanvaCapterraFigmaG2HubspotMartech
Artificial Intelligence • Hardware • Healthtech • Software
The VP of Quality leads the development and maintenance of the Quality Management System (QMS), ensures compliance with ISO 13485, collaborates with engineering on product quality, and develops a high-performing quality team.
Top Skills:
CapaFmeaIec 62304Iso 13485Plm Software
Blockchain • eCommerce • Fintech • Payments • Software • Financial Services • Cryptocurrency
Design, build, and operate production ML decision systems to detect and prevent payment fraud, account takeover, scams, and other abuse. Integrate diverse signals into low-latency serving and batch scoring, own feature pipelines and model lifecycle, develop AI-assisted triage and feedback loops, and partner cross-functionally to balance fraud reduction with legitimate customer access.
Top Skills:
Cloud InfrastructureData LakehouseData WarehouseEmbeddingsFeature StoreJavaKafkaKotlinKubernetesLightgbmModel ServingMonitoringObservabilityPythonPyTorchSQLTensorFlowWorkflow OrchestrationXgboost
What you need to know about the Calgary Tech Scene
Employees can spend up to one-third of their life at work, so choosing the right company is crucial, not just for the job itself but for the company culture as well. While startups often offer dynamic culture and growth opportunities, large corporations provide benefits like career development and networking, especially appealing to recent graduates. Fortunately, Calgary stands out as a hub for both, recognized as one of Startup Genome's Top 100 Emerging Ecosystems, while also playing host to a number of multinational enterprises. In Calgary, job seekers can find a wide range of opportunities.



