Testing AI Containers for Digital Twins in Science: A Cloud-HPC Workflow
Date:
First time at KubeCon + CloudNativeCon Europe 2025 in London. Great vibe, friendly crowd, and thousands of folks from the cloud-native world!
I also got to share our work on wiring remote HPC clusters into CI/CD, so we can automatically test AI/ML containers on real, specialised hardware, bridging cloud and SLURM to make sure what we ship runs where it matters.
Talk abstract
CERN is advancing the development of AI-based digital twins in science through projects like interTwin, an EC-funded project to develop a digital twin engine for science. These digital twins rely on HPC resources for training multi-node, multi-GPU models using containerized workflows. Developing such containers for HPC systems presents unique challenges, including accessing restricted HPC resources and integrating with HPC software stacks, while ensuring the interoperability between different container runtimes. We introduce a CI/CD workflow that bridges cloud and HPC and enables automated testing of AI/ML containers on the same SLURM-managed clusters where they will be deployed. By integrating Dagger’s reproducible CI runtime with HPC offloading, this approach validates both the software in the containers and their compatibility with HPC environments. This ensures the seamless deployment of AI-based digital twins, addressing the critical need for robust testing in hybrid environments.
Video recording
If you have troubles visualizing the recording above, watch it on YouTube.
Find more info on this talk here.
