We took the KVTES 2022–2025 (Finnish municipal sector collective agreement), a large, complex document that HR teams rely on daily, and gave it to three leading AI models with a simple prompt with the PDF document:
"Listaa kaikki vapaat ja lomat sopimuksesta."
We tested:
- GPT-5.5 Pro Extended (OpenAI)
- Claude Opus 4.7 Adaptive (Anthropic)
- NotebookLM (Google's source-grounded document and research assistant)
We also tested Certus under intentionally adverse ingestion conditions to see whether it would warn the user when source coverage was insufficient.