Jakob Merane
Loading...
Last Name
Merane
First Name
Jakob
ORCID
Organisational unit
09629 - Stremitzer, Alexander / Stremitzer, Alexander
5 results
Filters
Reset filtersSearch Results
Publications1 - 5 of 5
- LEXam: Benchmarking Legal Reasoning on 340 Law ExamsItem type: Working Paper
arXivFan, Yu; Ni, Jingwei; Merane, Jakob; et al. (2025)Long-form legal reasoning remains a key challenge for large language models (LLMs) in spite of recent advances in test-time scaling. To address this, we intro duce LEXAM, a novel benchmark derived from 340 law exams spanning 116 law school courses across a range of subjects and degree levels. The dataset comprises 4,886 law exam questions in English and German, including 2,841 long-form, open-ended questions and 2,045 multiple-choice questions. Besides reference answers, the open questions are also accompanied by explicit guidance outlining the expected legal reasoning approach such as issue spotting, rule recall, or rule application. Our evaluation on both open-ended and multiple-choice questions present significant challenges for current LLMs; in particular, they notably struggle with open questions that require structured, multi-step legal reasoning. Moreover, our results underscore the effectiveness of the dataset in differentiating between models with varying capabilities. Deploying an ensemble LLM-as-a-Judge paradigm with rigorous human expert validation, we demonstrate how model-generated reasoning steps can be evaluated consistently and accurately, closely aligning with human expert assessments. Our evaluation setup provides a scalable method to assess legal reasoning quality beyond simple accuracy metrics. We have open-sourced our code on GitHub and released our data on Hugging Face. Project page: https://lexam-benchmark.github.io/. - SwiLTra-Bench: The Swiss Legal Translation BenchmarkItem type: Conference Paper
Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)Niklaus, Joel; Merane, Jakob; Nenadic, Luka; et al. (2025)In Switzerland legal translation is uniquely important due to the country’s four official languages and requirements for multilingual legal documentation. However, this process traditionally relies on professionals who must be both legal experts and skilled translators—creating bottlenecks and impacting effective access to justice. To address this challenge, we introduce SwiLTra-Bench, a comprehensive multilingual benchmark of over 180K aligned Swiss legal translation pairs comprising laws, headnotes, and press releases across all Swiss languages along with English, designed to evaluate LLM-based translation systems. Our systematic evaluation reveals that frontier models achieve superior translation performance across all document types, while specialized translation systems excel specifically in laws but under-perform in headnotes. Through rigorous testing and human expert validation, we demonstrate that while fine-tuning open SLMs significantly improves their translation quality, they still lag behind the best zero-shot prompted frontier models such as Claude-3.5-Sonnet. Additionally, we present SwiLTra-Judge, a specialized LLM evaluation system that aligns best with human expert assessments. - SwiLTra-Bench: The Swiss Legal Translation BenchmarkItem type: Working Paper
Center for Law & Economics Working Paper SeriesNiklaus, Joel; Merane, Jakob; Nenadic, Luka; et al. (2025)In Switzerland legal translation is uniquely im portant due to the country’s four official languages and requirements for multilingual legal documentation. However, this process traditionally relies on professionals who must be both legal experts and skilled translators — creating bottlenecks and impacting effective access to justice. To address this challenge, we introduce SwiLTra-Bench, a comprehensive multilingual benchmark of over 180K aligned Swiss legal translation pairs comprising laws, headnotes, and press releases across all Swiss languages along with English, designed to evaluate LLM-based translation systems. Our systematic evaluation reveals that frontier models achieve superior translation performance across all document types, while specialized translation systems excel specifically in laws but under-perform in headnotes. Through rigorous testing and human expert validation, we demonstrate that while fine-tuning open SLMs significantly improves their translation quality, they still lag behind the best zero-shot prompted frontier models such as Claude-3.5-Sonnet. Additionally, we present SwiLTra-Judge, a specialized LLM evaluation system that aligns best with human expert assessments. - Automated Private Enforcement: Evidence from the Google Fonts CaseItem type: Working Paper
Center for Law & Economics Working Paper SeriesMerane, Jakob; Stremitzer, Alexander (2025)Plaintiffs often have little incentive to detect and enforce small claims, which reduces defendants’ incentives to comply. With advances in artificial intelligence, can automated private enforcement increase compliance? The Google Fonts Case offers a unique opportunity to explore this question. After a German court ruled that the dynamic embedding of Google Fonts violated the GDPR, an entrepreneurial lawyer in Austria used automated tools to detect violations and threaten website operators with lawsuits. Drawing on a comprehensive sample of 1,517,429 websites across 32 European countries over a two-year period, we use a difference-in-difference approach to show a significant compliance effect in Austria. Within three months, non-compliance dropped by 22.7 percentage points, a nearly 50% reduction. These findings suggest that automated private enforcement can be highly disruptive, pressuring policymakers to recalibrate legal rules. - Beschwerde, Recours, Ricorso?Item type: Journal Article
Aktuelle juristische PraxisBaltensperger, Luca; Etter, Leander; Merane, Jakob (2025)Der vorliegende Beitrag analysiert anhand computergestützter Methoden die Zitierpraktiken des Bundesgerichts seit 2007. Auf Basis eines neuen Datensatzes von 36’329 amtlich publizierten Bundesgerichtsentscheiden und 1’012’905 Zitierungen untersucht diese empirische Studie dabei erstmals die Rolle der Mehrsprachigkeit am Bundesgericht. Die Ergebnisse zeigen sprachbasierte Präferenzen in den Zitiermustern und liefern zudem interessante Hinweise für die Praxis und Dogmatik, indem die Zitierhäufigkeit eines Urteils zur Systematisierung der Rechtsprechung verwendet wird. Abschliessend skizzieren wir, wie sich im Zeitalter der Künstlichen Intelligenz sprachliche Informationssilos aufbrechen liessen.
Publications1 - 5 of 5