Skip to content
← Back to explorer

The Sufficiency-Conciseness Trade-off in LLM Self-Explanation from an Information Bottleneck Perspective

Ali Zahedzadeh, Behnam Bahrak · Feb 15, 2026 · Citations: 0

Abstract

Large Language Models increasingly rely on self-explanations, such as chain of thought reasoning, to improve performance on multi step question answering. While these explanations enhance accuracy, they are often verbose and costly to generate, raising the question of how much explanation is truly necessary. In this paper, we examine the trade-off between sufficiency, defined as the ability of an explanation to justify the correct answer, and conciseness, defined as the reduction in explanation length. Building on the information bottleneck principle, we conceptualize explanations as compressed representations that retain only the information essential for producing correct answers.To operationalize this view, we introduce an evaluation pipeline that constrains explanation length and assesses sufficiency using multiple language models on the ARC Challenge dataset. To broaden the scope, we conduct experiments in both English, using the original dataset, and Persian, as a resource-limited language through translation. Our experiments show that more concise explanations often remain sufficient, preserving accuracy while substantially reducing explanation length, whereas excessive compression leads to performance degradation.

HFEPX Relevance Assessment

This paper has direct human-feedback and/or evaluation protocol signal and is likely useful for eval pipeline design.

Eval-Fit Score

25/100 • Low

Treat as adjacent context, not a core eval-method reference.

Human Feedback Signal

Not explicit in abstract metadata

Evaluation Signal

Detected

HFEPX Fit

High-confidence candidate

Human Data Lens

  • Uses human feedback: No
  • Feedback types: None
  • Rater population: Unknown
  • Unit of annotation: Unknown
  • Expertise required: Multilingual
  • Extraction source: Persisted extraction

Evaluation Lens

  • Evaluation modes: Automatic Metrics
  • Agentic eval: Long Horizon
  • Quality controls: Not reported
  • Confidence: 0.55
  • Flags: ambiguous

Protocol And Measurement Signals

Benchmarks / Datasets

ARC-Challenge

Reported Metrics

accuracyconciseness

Research Brief

Deterministic synthesis

While these explanations enhance accuracy, they are often verbose and costly to generate, raising the question of how much explanation is truly necessary. HFEPX signals include Automatic Metrics, Long Horizon with confidence 0.55. Updated from current HFEPX corpus.

Generated Mar 2, 2026, 7:50 PM · Grounded in abstract + metadata only

Key Takeaways

  • While these explanations enhance accuracy, they are often verbose and costly to generate, raising the question of how much explanation is truly necessary.
  • Building on the information bottleneck principle, we conceptualize explanations as compressed representations that retain only the information essential for producing correct…

Researcher Actions

  • Treat this as method context, then pivot to protocol-specific HFEPX hubs.
  • Cross-check benchmark overlap: ARC-Challenge.
  • Validate metric comparability (accuracy, conciseness).

Caveats

  • Generated from title, abstract, and extracted metadata only; full-paper implementation details are not parsed.
  • Extraction confidence is probabilistic and should be validated for critical decisions.

Research Summary

Contribution Summary

  • While these explanations enhance accuracy, they are often verbose and costly to generate, raising the question of how much explanation is truly necessary.
  • Building on the information bottleneck principle, we conceptualize explanations as compressed representations that retain only the information essential for producing correct answers.To operationalize this view, we introduce an evaluation…
  • Our experiments show that more concise explanations often remain sufficient, preserving accuracy while substantially reducing explanation length, whereas excessive compression leads to performance degradation.

Why It Matters For Eval

  • Building on the information bottleneck principle, we conceptualize explanations as compressed representations that retain only the information essential for producing correct answers.To operationalize this view, we introduce an evaluation…

Researcher Checklist

  • Gap: Human feedback protocol is explicit

    No explicit human feedback protocol detected.

  • Pass: Evaluation mode is explicit

    Detected: Automatic Metrics

  • Gap: Quality control reporting appears

    No calibration/adjudication/IAA control explicitly detected.

  • Pass: Benchmark or dataset anchors are present

    Detected: ARC-Challenge

  • Pass: Metric reporting is present

    Detected: accuracy, conciseness

Related Papers

Papers are ranked by protocol overlap, extraction signal alignment, and semantic proximity.

Need human evaluators for your AI research? Scale annotation with expert AI Trainers.