A linguistically controlled benchmark and annotation protocol for evaluating language-model behaviour under instruction conflict, embedded commands, quotation, scope ambiguity, deixis, indirect speech acts, and multi-turn agent transcripts.
Status: arXiv preprint.
Markdown BibTeX arXiv Source Adjudication app
The Markdown file is an author-manuscript mirror provided for accessibility, search, and machine readability. Use the linked public record as the canonical citation target unless a later publisher version supersedes it.