Pier Giuseppe Sessa


Loading...

Last Name

Sessa

First Name

Pier Giuseppe

Organisational unit

Search Results

Publications 1 - 10 of 23
  • Sessa, Pier Giuseppe; De Martinis, Valerio; Bomhauer-Beins, Axel; et al. (2021)
    Public Transport
    The next generation of railway systems will require more and more accurate information for the planning of rail operation. These are essential for the introduction of automatic processes of an optimized traffic planning, the optimal use of infrastructure capacity and energy, and, overall, the introduction of data-driven approaches into rail operation. Train trajectories collection constitutes a primary source of information for offline procedures such as timetable generation, driving behaviour analysis and models’ calibration. Unfortunately, current train trajectory data are often affected by measurement errors, missing data and, in many cases, incongruence between dependent variables. To overcome this problem, a trajectory reconstruction problem must be solved, before using trajectories for any further purpose. In the present paper, a new hybrid stochastic trajectory reconstruction is proposed. On-board monitoring data on train position and velocity (kinematics) are combined with data on power used for traction and feasible acceleration values (dynamics). A fusion of those two types of information is performed by considering the stochastic characteristics of the data, via smoothing techniques. A promising potential use is seen especially in those cases where information on continuous train positions is not available or unreliable (e.g. tunnels, canyons, etc.).
  • Sessa, Pier Giuseppe; Bogunovic, Ilija; Kamgarpour, Maryam; et al. (2021)
    Advances in Neural Information Processing Systems 33
    We consider a repeated sequential game between a learner, who plays first, and an opponent who responds to the chosen action. We seek to design strategies for the learner to successfully interact with the opponent. While most previous approaches consider known opponent models, we focus on the setting in which the opponent's model is unknown. To this end, we use kernel-based regularity assumptions to capture and exploit the structure in the opponent's response. We propose a novel algorithm for the learner when playing against an adversarial sequence of opponents. The algorithm combines ideas from bilevel optimization and online learning to effectively balance between exploration (learning about the opponent's model) and exploitation (selecting highly rewarding actions for the learner). Our results include algorithm's regret guarantees that depend on the regularity of the opponent's response and scale sublinearly with the number of game rounds. Moreover, we specialize our approach to repeated Stackelberg games, and empirically demonstrate its effectiveness in a traffic routing and wildlife conservation task.
  • Ramesh, Shyam Sundhar; Sessa, Pier Giuseppe; Krause, Andreas; et al. (2022)
    Advances in Neural Information Processing Systems 35
    Contextual Bayesian optimization (CBO) is a powerful framework for sequential decision-making given side information, with important applications, e.g., in wind energy systems. In this setting, the learner receives context (e.g., weather conditions) at each round, and has to choose an action (e.g., turbine parameters). Standard algorithms assume no cost for switching their decisions at every round. However, in many practical applications, there is a cost associated with such changes, which should be minimized. We introduce the episodic CBO with movement costs problem and, based on the online learning approach for metrical task systems of Coester and Lee (2019), propose a novel randomized mirror descent algorithm that makes use of Gaussian Process confidence bounds. We compare its performance with the offline optimal sequence for each episode and provide rigorous regret guarantees. We further demonstrate our approach on the important real-world application of altitude optimization for Airborne Wind Energy Systems. In the presence of substantial movement costs, our algorithm consistently outperforms standard CBO algorithms.
  • Karaca, Orçun; Sessa, Pier Giuseppe; Walton, Neil; et al. (2019)
    IEEE Transactions on Automatic Control
  • Sessa, Pier Giuseppe; Bogunovic, Ilija; Krause, Andreas; et al. (2021)
    Advances in Neural Information Processing Systems 33
    We formulate the novel class of contextual games, a type of repeated games driven by contextual information at each round. By means of kernel-based regularity assumptions, we model the correlation between different contexts and game outcomes and propose a novel online (meta) algorithm that exploits such correlations to minimize the contextual regret of individual players. We define game-theoretic notions of contextual Coarse Correlated Equilibria (c-CCE) and optimal contextual welfare for this new class of games and show that c-CCEs and optimal welfare can be approached whenever players' contextual regrets vanish. Finally, we empirically validate our results in a traffic routing experiment, where our algorithm leads to better performance and higher welfare compared to baselines that do not exploit the available contextual information or the correlations present in the game.
  • Sessa, Pier Giuseppe; Bogunovic, Ilija; Kamgarpour, Maryam; et al. (2020)
    Advances in Neural Information Processing Systems 32
    We consider the problem of learning to play a repeated multi-agent game with an unknown reward function. Single player online learning algorithms attain strong regret bounds when provided with full information feedback, which unfortunately is unavailable in many real-world scenarios. Bandit feedback alone, i.e., observing outcomes only for the selected action, yields substantially worse performance. In this paper, we consider a natural model where, besides a noisy measurement of the obtained reward, the player can also observe the opponents' actions. This feedback model, together with a regularity assumption on the reward function, allows us to exploit the correlations among different game outcomes by means of Gaussian processes (GPs). We propose a novel confidence-bound based bandit algorithm GP-MW, which utilizes the GP model for the reward function and runs a multiplicative weight (MW) method. We obtain novel kernel-dependent regret bounds that are comparable to the known bounds in the full information setting, while substantially improving upon the existing bandit results. We experimentally demonstrate the effectiveness of GP-MW in random matrix games, as well as real- world problems of traffic routing and movie recommendation. In our experiments, GP-MW consistently outperforms several baselines, while its performance is often comparable to methods that have access to full information feedback.
  • Ramesh, Shyam Sundhar; Hu, Yifan; Chaimalas, Iason; et al. (2024)
    Advances in Neural Information Processing Systems 37
    Adapting large language models (LLMs) for specific tasks usually involves fine-tuning through reinforcement learning with human feedback (RLHF) on preference data. While these data often come from diverse labelers' groups (e.g., different demographics, ethnicities, company teams, etc.), traditional RLHF approaches adopt a "one-size-fits-all" approach, i.e., they indiscriminately assume and optimize a single preference model, thus not being robust to unique characteristics and needs of the various groups. To address this limitation, we propose a novel Group Robust Preference Optimization (GRPO) method to align LLMs to individual groups' preferences robustly. Our approach builds upon reward-free direct preference optimization methods, but unlike previous approaches, it seeks a robust policy which maximizes the worst-case group performance. To achieve this, GRPO adaptively and sequentially weights the importance of different groups, prioritizing groups with worse cumulative loss. We theoretically study the feasibility of GRPO and analyze its convergence for the log-linear policy class. By fine-tuning LLMs with GRPO using diverse group-based global opinion data, we significantly improved performance for the worst-performing groups, reduced loss imbalances across groups, and improved probability accuracies compared to non-robust baselines.
  • Sessa, Pier Giuseppe; Kamgarpour, Maryam; Krause, Andreas (2022)
    Proceedings of Machine Learning Research ~ Proceedings of the 39th International Conference on Machine Learning
    We consider model-based multi-agent reinforcement learning, where the environment transition model is unknown and can only be learned via expensive interactions with the environment. We propose H-MARL (Hallucinated Multi-Agent Reinforcement Learning), a novel sample-efficient algorithm that can efficiently balance exploration, i.e., learning about the environment, and exploitation, i.e., achieve good equilibrium performance in the underlying general-sum Markov game. H-MARL builds high-probability confidence intervals around the unknown transition model and sequentially updates them based on newly observed data. Using these, it constructs an optimistic hallucinated game for the agents for which equilibrium policies are computed at each round. We consider general statistical models (e.g., Gaussian processes, deep ensembles, etc.) and policy classes (e.g., deep neural networks), and theoretically analyze our approach by bounding the agents’ dynamic regret. Moreover, we provide a convergence rate to the equilibria of the underlying Markov game. We demonstrate our approach experimentally on an autonomous driving simulation benchmark. H-MARL learns successful equilibrium policies after a few interactions with the environment and can significantly improve the performance compared to non-optimistic exploration methods.
  • Adversarial Causal Bayesian Optimization
    Item type: Conference Paper
    Sussex, Scott; Sessa, Pier Giuseppe; Makarova, Anastasia; et al. (2024)
    The Twelfth International Conference on Learning Representations
    In Causal Bayesian Optimization (CBO), an agent intervenes on a structural causal model with known graph but unknown mechanisms to maximize a downstream reward variable. In this paper, we consider the generalization where other agents or external events also intervene on the system, which is key for enabling adaptiveness to non-stationarities such as weather changes, market forces, or adversaries. We formalize this generalization of CBO as Adversarial Causal Bayesian Optimization (ACBO) and introduce the first algorithm for ACBO with bounded regret: Causal Bayesian Optimization with Multiplicative Weights (CBO-MW). Our approach combines a classical online learning strategy with causal modeling of the rewards. To achieve this, it computes optimistic counterfactual reward estimates by propagating uncertainty through the causal graph. We derive regret bounds for CBO-MW that naturally depend on graph-related quantities. We further propose a scalable implementation for the case of combinatorial interventions and submodular rewards. Empirically, CBO-MW outperforms non-causal and non-adversarial Bayesian optimization methods on synthetic environments and environments based on real-word data. Our experiments include a realistic demonstration of how CBO-MW can be used to learn users' demand patterns in a shared mobility system and reposition vehicles in strategic areas.
  • Sessa, Pier Giuseppe (2022)
    Several important real-world problems involve multiple entities interacting with each other and can thus be modeled as multi-agent systems. Multi-agent systems are at the core of our society and, due to the recent advances in big data and artificial intelligence, are rapidly permeating into new application domains such as autonomous driving, e-commerce, shared mobility, etc. At the same time, however, such recent progress has brought relevant challenges related to the decision-making, learning, and efficiency of such systems which makes them less understood than their single-agent counterparts. In this thesis, we aim to partially address some of these challenges. The first part of the thesis investigates the problem of sample-efficient active data collection in multi-agent systems, i.e., how agents can acquire new data and learn about the underlying game without sacrificing performance. This problem, also known as the exploration vs. exploitation dilemma, has been extensively studied in single-agent problems but remains fairly unexplored in multi-agent domains. We propose a novel approach to this, which consists of using past observed data to exploit the correlations present in the game by means of statistical regression techniques. This allows the agents to build high-probability confidence intervals around the underlying game rewards and use these to improve their strategy via optimism in the face of uncertainty. We first instantiate this idea in normal-form games and then extend it to a newly defined class of contextual games (where agents observe contextual information before playing), Markov games, and sequential (Stackelberg) games. We provide theoretical regret bounds of the resulting algorithms, yielding provable convergence to equilibria. Moreover, we evaluate our methods in experimental case studies in traffic routing, autonomous driving interactions, and wildlife protection. Our algorithms gradually learn about the underlying game and display a significantly lower regret compared to the existing baselines that utilize solely the obtained game rewards (the so-called bandit feedback). Moreover, they often achieve comparable performance to methods that – unlike ours – require full information about the game. In the second part of the thesis, we study the system-level efficiency of multi-agent systems, i.e., the quality of their equilibria (arising from agents’ selfishness) with respect to a system-level objective. There is a long history of research that upper bounds their inefficiency but this has mostly considered games with finite or discrete actions. We extend some of these results to a novel class of continuous action games displaying certain regularity conditions. Moreover, we provide more general efficiency bounds for the case of time-varying contextual games and in the presence of learning agents. Then, motivated by the obtained results and by emerging applications in shared mobility, we consider the problem of designing multi-agent systems to solve hard resource allocation problems (such as rebalancing a bike-sharing system) in a distributed fashion. We propose a novel algorithm for this task and, based on the results obtained in the previous chapters, we provide rigorous convergent and approximation guarantees.
Publications 1 - 10 of 23