itwinai - AI on cloud and HPC made simple for science
itwinai is a python toolkit developed in the interTwin project to support large-scale Digital Twin applications in science.
itwinai is a python toolkit developed in the interTwin project to support large-scale Digital Twin applications in science.
Lecture materials and exercises on MDP, SARSA, Q-learning, and Deep RL (DQN) using Gymnasium.
Energy Based Models are a quite novel technique for density estimation. In this university project I explore this new research topic and implement EBMs as generative models, comparing the results obtained with Maximum Likelihood estimation and Sliced Score Matching on MNIST and a toy 2D dataset.
In this project I carried out at EURECOM university I deeply delve into the theory of Graph Convolutional Networks and explore solutions for anomaly detection on huge financial graphs.
Self-supervised Domain Adaptation between real and synthetic (generated) RGB-D images for robotic vision.
Angular web app for daily Covid updates visualization, hosted on Google Cloud platform.
Published in PoliTO, 2022
As cyberspace becomes more and more complex, malware authors strive to take advantage of the growing number of vulnerabilities. This requires security researchers to invest more effort in developing automated malware analysis tools, able to cope with the increasing pace of suspicious binaries. The Achilles heel of automated malware analysis tools is evasive malware, which may put countless strategies in place to impede analysis. For instance, malware trying to detect that they are under observation in an analysis environment and, as a consequence, conceal their malicious behavior by performing only innocuous operations. Evasive malware can hinder analysis by either performing static code obfuscation (eg via packers, crypters) or dynamic evasion (eg sandbox and debugger evasion). The goal of this thesis is to explore the application of Reinforcement Learning (RL) to dynamic analysis, to reduce the burden of exhaustive exploration of conditional paths. To this end, we develop a model that is capable of noticing new evasive schemes via RL. Stateof-the-art approaches generally identify evasions through fingerprinting, which is not effective in identifying slight mutations of evasion schemes. In contrast, this work employs a language model to abstract out the syntax of binary code, while preserving its semantics. Furthermore, to improve over state-of-the-art solutions, the presented solution considers both evasion schemes and the malicious nature of regions of code protected by evasive conditions, to better distinguish true evasive behaviors from false positives. As a consequence, this method is easily extended to guide the search of…
Recommended citation: Matteo Bunino (2022). "Reinforcement Learning-aided Dynamic Analysis of Evasive Malware. https://webthesis.biblio.polito.it/secure/22588/1/tesi.pdf
Published in arXiv preprint arXiv:2207.05669, 2022
Graph Convolutional Networks (GCNs) have been shown to be a powerful concept that has been successfully applied to a large variety of tasks across many domains over the past years. In this work we study the theory that paved the way to the definition of GCN, including related parts of classical graph theory. We also discuss and experimentally demonstrate key properties and limitations of GCNs such as those caused by the statistical dependency of samples, introduced by the edges of the graph, which causes the estimates of the full gradient to be biased. Another limitation we discuss is the negative impact of minibatch sampling on the model performance. As a consequence, during parameter update, gradients are computed on the whole dataset, undermining scalability to large graphs. To account for this, we research alternative methods which allow to safely learn good parameters while sampling only a subset of data per iteration. We reproduce the results reported in the work of Kipf et al. and propose an implementation inspired to SIGN, which is a sampling-free minibatch method. Eventually we compare the two implementations on a benchmark dataset, proving that they are comparable in terms of prediction accuracy for the task of semi-supervised node classification.
Recommended citation: Matteo Bunino (2022). "From Spectral Graph Convolutions to Large Scale Graph Convolutional Networks" Journal arXiv preprint arXiv:2207.05669. https://arxiv.org/pdf/2207.05669
Published in SC24-W: Workshops of the International Conference for High Performance Computing, Networking, Storage and Analysis, 2024
Scientific workflows and provenance are two faces of the same medal. While the former addresses the coordinated execution of multiple tasks over a set of computational resources, the latter relates to the historical record of data from its original sources. This paper highlights the importance of tracking multi-level provenance metadata in complex, AI-based scientific workflows as a way to (i) foster and (ii) expand documentation of experiments, (iii) enable reproducibility, (iv) address interpretability of the results, (v) facilitate performance bottlenecks diagnosis, and (vi) advance provenance exploration and analysis opportunities.
Recommended citation: G. Padovani et al., "A software ecosystem for multi-level provenance management in large-scale scientific workflows for AI applications," SC24-W: Workshops of the International Conference for High Performance Computing, Networking, Storage and Analysis, Atlanta, GA, USA, 2024, pp. 2024-2031, doi: 10.1109/SCW63240.2024.00253 https://doi.org/10.1109/SCW63240.2024.00253
Published in 27th International Conference on Computing in High Energy and Nuclear Physics (CHEP 2024), 2025
itwinai is a Python library designed to facilitate scalable AI workflows on High-Performance Computing (HPC) systems. By abstracting complex engineering tasks, it reduces overhead and enables seamless scaling of AI models across diverse infrastructures. Deployed on HPC systems, such as Jülich and Vega, itwinai allows users to switch between distributed machine learning (ML) frameworks, including PyTorch DDP, DeepSpeed, and Horovod, through a unified interface. The library integrates mechanisms for profiling computational efficiency, tracking key metrics, such as GPU utilization and power consumption, and optimizing hyperparameters at scale. Additionally, it provides transparent offloading of compute-intensive tasks from the cloud to HPC systems and supports model and pipeline parallelism to accommodate largescale models. A continuous integration and deployment pipeline ensures reproducibility and compatibility across environments. In the interTwin project, itwinai has been utilized in physics and climate research use cases, demonstrating its potential to enhance sustainability and computational efficiency. Its integration with leading scientific computing centers highlights its role in advancing AI-driven digital twins and addressing large-scale scientific challenges.
Recommended citation: Bunino, Matteo, Jarl Sondre Sæther, Anna Elisa Lappe, Maria Girone, and Kalliopi Tsolaki. “Itwinai: Enabling Scalable AI Workflows on HPC for Digital Twins in Science.” EPJ Web of Conferences 337 (2025): 01361. https://doi.org/10.1051/epjconf/202533701361. https://doi.org/10.1051/epjconf/202533701361
Published:
This talk was given on the first day of the 2023 CERN openlab Technical Workshop and concerned the introduction to interTwin, an EC-funded project focusing on developing a Digital Twins Engine (DTE) for scientific applications, with the goal of alleviating the engineering burden on the reresearchers, benefitting the scientific community.
Published:
This talk has been given in collaboration with Maria Girone, Head of CERN openlab, to introduce what we do at CERN openlab and the lecture programme to the students selected for the CERN openlab summer student programme 2023.
Published:
A crash course on Reinforcement Learning (RL) delivered to CERN openlab summer students, in the context of CERN openlab summer student lecture programme, with the goal of explaining the potential of RL and its real-world applications in particle accelerators at CERN.
Published:
This talk was given on the first day of the 2024 CERN openlab Technical Workshop and concerned the introduction to interTwin, an EC-funded project focusing on developing a Digital Twins Engine (DTE) for scientific applications, with the goal of alleviating the engineering burden on the reresearchers, benefitting the scientific community.
Published:
This talk was given on the second day of the 2024’s ISC conference and was shared among experts from industry (Google, PGS) and academia (CERN, DFKI, RWTH), driving a discussion on current strategies of integrating cloud and HPC infrastructure, finetuning it on the needs of different user bases.
Published:
This poster was presented at PASC24 conference, hosted at ETH Zurich, and summarized our main contributions to the interTwin project in terms of large-scale scientific AI workflows on EuroHPC resources, to support digital twin applications in Physics and Climate Research.
Published:
An hands-on crash course on Reinforcement Learning (RL) delivered to CERN openlab summer students, in the context of CERN openlab summer student lecture programme, with the goal of explaining the potential of RL and its real-world applications in particle accelerators at CERN. The course focused in particular on learning by doing, solving exercises together to better grasp the sometimes abstract concepts of RL.
Published:
This live stream was organized by Diego Ciangottini (INFN) and showcased a demo on how to to employ interLink, a software component developed in the interTwin project, meant to enable cloud-HPC integration. Matteo showed how interLink can be used to offload to remote HPC resources compute-intensive AI workflows used in scientific applications, with a particular focus on maintaining consistent ML metadata across different computing infrastructures using MLFlow.
Published:
This talk was given on the second day of the 2025 CERN openlab Technical Workshop and showcased my work on scientific digital twins.
Published:
First time at KubeCon + CloudNativeCon Europe 2025 in London. Great vibe, friendly crowd, and thousands of folks from the cloud-native world!
Published:
I organized and moderated a Birds-of-a-Feather session at International Supercomputing 2025. I invited speakers from NVIDIA, CINES, ECMWF, SURF, and CERN to share their work on scientific digital twins and discuss with the audience the challenges and opportunities on the path to next-generation digital twins in science.
Published:
This poster was presented at PASC25 conference and summarized our integration work with itwinai and a physics use case applying generative AI methods (normalizing flows) to lattice quantum chromodynamics (QCD).
Published:
This poster was presented at PASC25 conference and summarized our work on Nvidia Omniverse targeting interactive visualizations of physics events generated with Geant4.
Published:
This poster was presented at PASC25 conference and summarized our work on itwinai, a Python package for large-scale scientific AI workflows on HPC resources, to support digital twin applications in Physics and Climate Research.
Published:
A crash course on Reinforcement Learning (RL) delivered to CERN openlab summer students, in the context of CERN openlab summer student lecture programme, with the goal of explaining the potential of RL and its real-world applications in particle accelerators at CERN.
Undergraduate course, University 1, Department, 2014
This is a description of a teaching experience. You can use markdown like any other post.
Workshop, University 1, Department, 2015
This is a description of a teaching experience. You can use markdown like any other post.