Overview
With the rapid advancement of generative AI models, challenges remain regarding their safety, security, and alignment with human values. Large language models (LLMs) face risks such as hallucinations that produce false information and prompt misuse that may lead to the unlawful acquisition of personal data.
NeSyDebates develops a Neuro-Symbolic Debate System that: (1) automatically extracts machine-readable normative argument structures from natural-language descriptions of norm violations — covering policies, regulations, and laws.
(2) applies these extracted normative arguments to new cases to detect, explain, and prevent potential violations.
The methods will be validated in two application domains: legal document processing with LLMs, and text-to-image generation AI used in online image creation and distribution.
Our Contributions
Within NeSyDebates, our group at the APSS Lab focuses on diffusion-based model safety:
- Characterizing and explaining model behavior — understanding when and why such produce policy-violating outputs.
- Robust, interpretable neuro-symbolic methods — designing enforceable safeguards grounded in policies, regulations, and laws for diffusion models.
- Machine unlearning — developing approaches to correct harmful outputs by selectively removing offending knowledge from trained models.
- Evaluation against real-world constraints — benchmarking methods against real policies and legal frameworks.
Grant information
[EPSRC] [UKRI3928]
NeSyDebates was selected as one of four funded projects out of 85 proposals submitted to the joint JST–EPSRC ASPIRE call. The project brings together expertise in computational logic, argumentation, multi-agent systems, AI safety, and law.
Value: £1,612,586.73
UK PI: Prof Francesca Toni (Computing, Imperial College London)
UK Co-Is: Dr Soteris Demetriou (Computing, Imperial College London), Prof Alessandra Russo (Computing, Imperial College London), Prof Felix Steffek (Faculty of Law, University of Cambridge)
Recent Publications:
- [EXTRAAMAS] . ArgMLLMs: Argumentative Multimodal Safety Judgements with Contestable Reasoning. Proceedings of the 8th International Workshop on EXplainable, Trustworthy, and Responsible AI and Multi-Agent Systems (EXTRAAMAS 2026).


Recent Comments