Interdisciplinary projects

Machine learning for optimisation

We are interested in applying machine learning to optimise complex systems, where an exact solution is either computationally unfeasible due and/or brittle with respect to changes in conditions. We are particularly interested in problems which have a large space of possible solutions, and those which are affected by known and unknown uncertainties. This leads us to consider reinforcement learning as the main pillar our technique.

What is reinforcement learning and why is it so powerful?

Reinforcement learning is an unsupervised learning technique in order to optimise behavior. The scientist specifies what the agent should do via a reward function, which determines what the agent outcomes should or should not achieve. The agent is then trained to maximise the total reward it obtains, which is done by optimising a policy. The policy is defined inside the framework of a Markov Decision Process, which determines which actions to take based on the current information (state). Reinforcement learning is a powerful tool for automatic decision making in complex systems. It is particularly advantageous at dealing with systems with a vast space of possible actions, where brute force methods would be unfeasible.

Consider the problem of chess, where the number of possible games is larger than the number of atoms in the universe (!). Deivising a brute force memory based approach would in this case never work, and one must therefore devise a heuristic that will generalise to games that have never been seen by the agent before. By giving a reward for winning games, the reinforcement learning agent will learn a winning policy at superhuman intelligence.

Bringing reinforcement learning to the real world

Reinforcement learning is extremely powerful but can only learn based on the model of the environment or on the data it is given to train. If the data becomes out of date (the problem is not stationary), or the model of the evironment is different to the reality, then the agent can make sub-optimal decisions. This is important for real world applications, in which the system often has uncertainties, shifting distributions, data scarcity and multiple objectives. Making reinforcement learning robust against these issues is our main research focus, where distributional reinforcement learning and distributionally robust reinforcement learning are among the most promising directions.

Casual inference and discovery for epidemiology

One promising avenue is the contributing to the prerenial problem of disentangling casuation and correlation in epidemiology. While this can be done with randomly controlled trials, where a control group is compared to a treatment group, these are very expensive and often statistically limited. Instead one can use observational data from medical centres, which have much larger size, but suffer from biases due to selection pressure. For example, the propensity to take a treatment depends on variables that also affect health outcomes. Disentangling whether taking a treatment improved a health outcome assuming everything else is equal is very difficult in this setting.

The mathematical building block to describe these casual relationships are directed acylical graphs (DAGs), which is a graph with directed edges that has no directed cycles, meaning one cannot start at a node and follow the edges to return to the same node. A casual relationship two variables A and B is represented with two nodes and a directed edge between the two of them. The problem is that with many variables, including a time dependence, the number of possible DAGs to describe a casual system becomes very large, meaning that an efficient algorithm to explore potential DAGs that best match the data becomes highly valuable. This is where we use reinforcement learning to efficiently explore the possible DAGs and chose the one which best matches our data. We are also involved in casual inference, where we use graph neural networks to describe the casual system to explore counterfactuals. This is vitally important for determining the effect of treatment.

Supply chain and logistics

This research activity addresses the fundamental challenges of optimisation and decision-making in modern supply-chain and logistics networks, which are characterised by high dimensionality, stochastic dynamics, and limited data availability. Conventional approaches such as deterministic programming or machine-learning models trained on historical data often fail to generalise when market conditions, regulations, or technologies change. Our work aims to overcome these limitations by combining optimal transport theory with reinforcement learning in a mathematically rigorous framework. The approach replaces the notion of a single detailed “digital twin” with an ensemble of simplified simulations, each constrained by optimal-transport bounds that define the admissible range of uncertainty. This formulation allows learning agents to identify control policies
that remain effective under unforeseen disruptions and distributional shifts, while remaining computationally tractable for large-scale, multi-echelon systems. Graph neural networks are employed to represent the topology of interconnected supply nodes and transport links, enabling the model to propagate local constraints through the global network structure and to capture emergent collective behaviours.

A central difficulty in this domain lies in reconciling the competing requirements of robustness, scalability, and interpretability. Our framework directly addresses this by incorporating distributionally robust optimisation within the reinforcement-learning process, using optimal transport to quantify and control deviations between modelled and real-world dynamics. Generative and stochastic modelling techniques are further used to create synthetic but physically consistent scenarios, allowing systematic exploration of rare or extreme events that are typically
inaccessible to data-driven models. Beyond the development of the theoretical foundations, the project involves constructing GPU-accelerated simulation environments and designing quantitative validation protocols that benchmark the resulting algorithms against established methods in operations research. This research aims to establish a coherent mathematical foundation for adaptive and sustainable logistics optimisation, bridging the gap between rigorous theory and the complex, uncertain realities of large-scale supply networks.