Skip to content
← Back to explorer

Conflict-Aware Fusion: Resolving Logic Inertia in Large Language Models via Structured Cognitive Priors

Qiming Bao, Xiaoxuan Fu, Michael Witbrock · Dec 6, 2025 · Citations: 0

Abstract

Large language models (LLMs) excel at many natural language tasks, yet their reasoning reliability under structured perturbations of rule-based systems remains brittle. We present a controlled evaluation framework consisting of four stress tests: (1) rule deletion (redundant vs. essential); (2) contradictory evidence injection; (3) logic-preserving rewrites; and (4) multi-law equivalence stacking. While representative model families (BERT, Qwen2, and TinyLlama) achieve Acc = 1.0000 on base tasks, our framework reveals a critical failure mode termed Logic Inertia - a total breakdown (Acc = 0.0000) under contradictions, where deductive momentum overrides factual reality. To resolve this, we propose Conflict-Aware Fusion, a framework grounded in the Cognitive Structure Hypothesis which posits that robust reasoning requires an explicit structural inductive bias. By imposing a dual-process architecture that separates premise verification from logical deduction, Conflict-Aware Fusion eliminates logic inertia, achieving 1.0000 accuracy on both base and contradictory stress tests, and significantly enhancing robustness to missing evidence. Our results demonstrate that, for reliable multi-step reasoning, structural verification discipline is as critical as training data scale, providing a blueprint for building robust, contradiction-aware AI systems https://github.com/14H034160212/lemo. See the OpenAI/Evals pull request https://github.com/openai/evals/pull/1622.

Human Data Lens

  • Uses human feedback: No
  • Feedback types: None
  • Rater population: Unknown
  • Unit of annotation: Unknown
  • Expertise required: Law

Evaluation Lens

  • Evaluation modes: Automatic Metrics
  • Agentic eval: Long Horizon
  • Quality controls: Not reported
  • Confidence: 0.45
  • Flags: ambiguous

Research Summary

Contribution Summary

  • Large language models (LLMs) excel at many natural language tasks, yet their reasoning reliability under structured perturbations of rule-based systems remains brittle.
  • We present a controlled evaluation framework consisting of four stress tests: (1) rule deletion (redundant vs.
  • essential); (2) contradictory evidence injection; (3) logic-preserving rewrites; and (4) multi-law equivalence stacking.

Why It Matters For Eval

  • We present a controlled evaluation framework consisting of four stress tests: (1) rule deletion (redundant vs.

Related Papers