AI Infrastructure Co-Design with Chakra
Full Stack & Cloud Stage
—
30 minutes
Artificial Intelligence
Machine Learning
Software Development
Building AI supercomputers through hardware/software co-design is now industry standard practice, yet a fundamental question remains unanswered: how do you inject realistic workload behavior into the design process to meaningfully evaluate components and tradeoffs across the design space?
This talk introduces Chakra, an open-standard ecosystem under MLCommons that is changing how the industry approaches AI performance benchmarking and co-design. At its core is a portable, graph-based execution trace format that captures what a distributed AI workload actually does: compute, memory, communication, and dependencies, letting hardware and software teams reason about performance without exposing proprietary model details. Already in use at NVIDIA, AMD, Meta, Google, and Keysight.We will cover:The case for co-design and the common frictionsHow Chakra traces are collected and used for replay, emulation, simulation, and validationInfraGraph (infragraph.dev): the open standard for infrastructure topology that completes the other half of the equationStorage as the next open frontier in AI workload benchmarking
Attendees will leave with: A mental model for how leading AI infrastructure teams approach system co-design and validationA introduction to open tools (Chakra, InfraGraph, ASTRA-sim) already shaping next-generation hardwareAn honest view of where the hard problems still live, and how to get involved through MLCommonsThis talk draws from research accepted at MLSys 2026 and ongoing open-source work under MLCommons.
Read More...