Token Constraint Decoding Improves Robustness on Question Answering for Large Language Models

robustness

decoding

question-answering

llms

Proposes Token Constraint Decoding (TCD), an inference-time method that stabilizes multiple-choice QA under noisy input perturbations.

Symbolic depiction of constrained decoding stability.

Token Constraint Decoding (TCD) enforces alignment between token-level predictions during inference to mitigate performance drops from prompt noise.

Method Summary

TCD adds a lightweight penalty term when local token predictions deviate beyond tolerance. This acts as an implicit regularizer without model retraining.

Experimental Scope

Benchmarks: CommonsenseQA, MMLU, MMLU-Pro.

The method—especially combined with targeted prompt engineering—recovers large absolute accuracy drops (up to +39% for smaller models like Gemma3 1B) under adversarial or noisy variants.

Insights

Penalty schedule selection differs per model family; sweeping reveals robustness–overconfidence trade-offs.

Practical Value

Model-agnostic, inference-only, and deployable in safety-critical QA systems.