Evaluating and Finetuning LLMs for Multilingual Legal Summarisation

Headnotes are concise summaries of court decisions that distill key legal points, enabling legal practitioners to navigate complex rulings efficiently. In this work, we evaluate the performance of the open-source, Swiss-focused Apertus family of models for the task of Swiss landmark decision summarisation. Our experiments establish the fully fine-tuned Apertus 8B model as the top performer within the Apertus family. It sets a robust open-weight baseline that is competitive with leading proprietary models on key lexical similarity metrics. However, we identify a significant adaptation challenge: while fine-tuning successfully teaches the models to adopt the specific structural format of Swiss headnotes, they often struggle to maintain the underlying legal reasoning. This results in a performance “regret” where models prioritize superficial stylistic alignment over logical coherence, particularly in cross-lingual settings. Our code and fine-tuned checkpoints are publicly available

📄 Download Report