Token Constraint Decoding (TCD) enforces alignment between token-level predictions during inference to mitigate performance drops from prompt noise.
Method Summary
TCD adds a lightweight penalty term when local token predictions deviate beyond tolerance. This acts as an implicit regularizer without model retraining.
Experimental Scope
Benchmarks: CommonsenseQA, MMLU, MMLU-Pro.
The method—especially combined with targeted prompt engineering—recovers large absolute accuracy drops (up to +39% for smaller models like Gemma3 1B) under adversarial or noisy variants.
Insights
Penalty schedule selection differs per model family; sweeping reveals robustness–overconfidence trade-offs.
Practical Value
Model-agnostic, inference-only, and deployable in safety-critical QA systems.