Graphcore Logo

Graphcore

AI Hardware Systems Validation Architect

Posted 9 Hours Ago
Be an Early Applicant
Hybrid
Austin, TX
Expert/Leader
Hybrid
Austin, TX
Expert/Leader
The AI HW Systems Validation Architect is responsible for the end-to-end validation of AI hardware platforms, ensuring reliability and coverage across multiple domains, and managing the collaboration with engineering teams.
The summary above was generated by AI

About us 

Graphcore is one of the world’s leading innovators in Artificial Intelligence compute. It is developing hardware, software and systems infrastructure that will unlock the next generation of AI breakthroughs and power the widespread adoption of AI solutions across every industry. 

As part of the SoftBank Group, Graphcore is a member of an elite family of companies responsible for some of the world’s most transformative technologies. Together, they share a bold vision: to enable Artificial Super Intelligence and ensure its benefits are accessible to everyone. 

Graphcore’s teams are drawn from diverse backgrounds and bring a broad range of skills and perspectives. A melting pot of AI research specialists, silicon designers, software engineers and systems architects, Graphcore enjoys a culture of continuous learning and constant innovation. 

Job Summary 

We are seeking an experienced AI HW Systems Validation Architect to serve as the technical authority for validation of next-generation AI server and rack-scale platforms. 

This role defines and drives the end-to-end validation architecture across blade-level and rack-level systems. The successful candidate will ensure comprehensive validation coverage across functional, electrical, networking, stress, and thermal domains to enable reliable hyperscale AI infrastructure deployments. 

The Team 

Graphcore is a globally recognised leader in Artificial Intelligence computing systems. The company designs advanced semiconductors and data centre hardware that provide the specialised processing power needed to drive AI innovation, while delivering the efficiency required to support its broader adoption 

The Systems Engineering and Platform Validation team ensures Graphcore’s AI compute platforms are validated and production-ready for hyperscale data center environments. 

The team collaborates closely with silicon enablement, hardware architecture, firmware, system integration, and operations teams to validate complex server and rack-level systems and ensure platform reliability, performance, and scalability. 

Responsibilities and Duties 

  • Own the end-to-end validation methodology and technical strategy for AI hardware platforms across blade-level and rack-level systems. 
  • Drive validation of rack-scale platforms covering functional, power, cooling, networking fabric, and system reliability. 
  • Collaborate with rack validation teams to validate full rack configurations, power distribution, cooling loop integration, and system reliability. 
  • Define and lead execution of comprehensive validation test plans for internal teams and ODM validation partners. 
  • Ensure validation coverage aligns with architectural, electrical, and mechanical specifications across CPU, GPU, DDR, PCIe, storage, and networking interfaces. 
  • Oversee liquid cooling validation including performance, leak detection, and long-term reliability of cooling hardware. 
  • Lead debug and issue management across cross-functional engineering teams and external partners. 
  • Establish validation dashboards, coverage metrics, and quality indicators to monitor execution progress. 
  • Partner with architecture, silicon enablement, firmware, and operations teams to ensure robust system bring-up and production readiness. 

Candidate Profile 

Essential 

  • Bachelor’s or Master’s degree in Electrical Engineering, Computer Engineering, or related discipline. 
  • 15+ years of experience in server hardware validation or system engineering. 
  • Proven experience validating board, blade, and rack-level server hardware platforms. 
  • Strong knowledge of high-speed interfaces such as PCIe, CXL, DDR, NVLink, and Ethernet. 
  • Experience developing validation methodologies and large-scale validation test plans. 
  • Experience leading debug and failure analysis across complex systems. 
  • Experience managing ODM validation programs including test planning and issue tracking. 
  • Familiarity with liquid cooling validation and system-level thermal reliability. 

Desirable 

  • Experience with ARM-based or x86 server architectures. 
  • Background in rack integration testing and hyperscale deployment readiness. 
  • Experience with automated validation frameworks and test data analytics. 
  • Strong program leadership and cross-functional collaboration skills. 

Top Skills

Ai Hardware
Computer Engineering
Cxl
Ddr
Electrical Engineering
Ethernet
Nvlink
Pcie

Similar Jobs at Graphcore

An Hour Ago
Hybrid
Austin, TX, USA
Senior level
Senior level
Artificial Intelligence • Semiconductor
Lead validation and quality assurance for firmware stacks on ARM-based servers, including security, functionality, and reliability testing.
Top Skills: ArmEdk IiGdbGpioI2CI3CIpmiJtagLogic AnalyzersMctpOpenbmcPciePldmProtocol AnalyzersRedfishSmbusSpiUartUefi
4 Hours Ago
Hybrid
2 Locations
Expert/Leader
Expert/Leader
Artificial Intelligence • Semiconductor
Lead the architecture and development of OpenBMC firmware for AI server platforms, enabling hardware integration, developing security capabilities, and collaborating with teams for reliable firmware delivery.
Top Skills: BashBitbakeCC++Ci/CdD-BusGdbI3CI²CJtagMctpOpenbmcPciePldmPythonRedfishSpiYocto
4 Hours Ago
Hybrid
2 Locations
Senior level
Senior level
Artificial Intelligence • Semiconductor
Lead architecture and development of OpenBMC firmware for AI infrastructure, collaborating with partners on reliability, scalability, and serviceability.
Top Skills: BashCC++Ci/CdDcmiI2CI3CIpmiLinuxMctpNc-SiOpenbmcPciePldmPmciPythonRedfishSgpioSpiUartUsbYocto

What you need to know about the Calgary Tech Scene

Employees can spend up to one-third of their life at work, so choosing the right company is crucial, not just for the job itself but for the company culture as well. While startups often offer dynamic culture and growth opportunities, large corporations provide benefits like career development and networking, especially appealing to recent graduates. Fortunately, Calgary stands out as a hub for both, recognized as one of Startup Genome's Top 100 Emerging Ecosystems, while also playing host to a number of multinational enterprises. In Calgary, job seekers can find a wide range of opportunities.

Sign up now Access later

Create Free Account

Please log in or sign up to report this job.

Create Free Account