itwinai: Enabling Scalable AI Workflows on HPC for Digital Twins in Science
Published in 27th International Conference on Computing in High Energy and Nuclear Physics (CHEP 2024), 2025
itwinai is a Python library designed to facilitate scalable AI workflows on High-Performance Computing (HPC) systems. By abstracting complex engineering tasks, it reduces overhead and enables seamless scaling of AI models across diverse infrastructures. Deployed on HPC systems, such as Jülich and Vega, itwinai allows users to switch between distributed machine learning (ML) frameworks, including PyTorch DDP, DeepSpeed, and Horovod, through a unified interface. The library integrates mechanisms for profiling computational efficiency, tracking key metrics, such as GPU utilization and power consumption, and optimizing hyperparameters at scale. Additionally, it provides transparent offloading of compute-intensive tasks from the cloud to HPC systems and supports model and pipeline parallelism to accommodate largescale models. A continuous integration and deployment pipeline ensures reproducibility and compatibility across environments. In the interTwin project, itwinai has been utilized in physics and climate research use cases, demonstrating its potential to enhance sustainability and computational efficiency. Its integration with leading scientific computing centers highlights its role in advancing AI-driven digital twins and addressing large-scale scientific challenges.
Recommended citation: Bunino, Matteo, Jarl Sondre Sæther, Anna Elisa Lappe, Maria Girone, and Kalliopi Tsolaki. “Itwinai: Enabling Scalable AI Workflows on HPC for Digital Twins in Science.” EPJ Web of Conferences 337 (2025): 01361. https://doi.org/10.1051/epjconf/202533701361. https://doi.org/10.1051/epjconf/202533701361
