Summary: In this talk I will summarize some of my work related to untrusted data sources during my PhD at Imperial College London and at IBM Research. The first part of the talk will be on indiscriminate data poisoning attacks against supervised learning. I propose a novel attack formulation that considers the effect of the attack on the model’s hyperparameters. I apply this attack formulation to several ML classifiers using L2 regularization. My evaluation shows the benefits of using regularization to help mitigate poisoning attacks, when hyperparameters are learned using a trusted dataset. I then introduce a threat model for poisoning attacks against regression models, and propose a novel stealthy attack formulation against regression models via multiobjective bilevel optimization, where the two objectives are attack effectiveness and detectability. I experimentally show that state-of-the-art defenses do not mitigate these stealthy attacks. In the second part of the talk, I will talk about our work on factuality detection and correction, introducing FactReasoner and FactCorrector. FactReasoner is a novel neuro-symbolic-based factuality assessment framework that employs probabilistic reasoning to evaluate the truthfulness of long-form generated responses. FactReasoner estimates the posterior probability that each part of the response is supported by the evidence retrieved from external knowledge sources. Our experiments demonstrate that FactReasoner often outperforms state-of-the-art prompt-based methods. Finally, FactCorrector is a new post-hoc correction method leverages structured feedback about the factuality of the original response to generate a correction. Experiments show that FactCorrector significantly improves factual precision while preserving relevance, outperforming strong baselines.
Bio: Javier Carnerero-Cano is a Research Scientist working on trustworthy AI at IBM Research Europe (Ireland), where he leads research efforts in the factuality of long-form LLM answers. He obtained his PhD in AI Security in 2024 at Imperial College London. In his PhD, he focused on indiscriminate data poisoning attacks against supervised learning. He also did a research internship at IBM Research in the security of machine unlearning. He obtained his MRes in Machine Learning and Signal Processing, and his MSc and BEng in Telecommunications Engineering from Universidad Carlos III de Madrid in Spain, where he also received the Alumni Excellence Award.