SAKE: Structured Agentic Knowledge Extrapolation for Complex LLM Reasoning via Reinforcement Learning

Jiashu He, Jinxuan Fan, Bowen Jiang, Ignacio Houine, Dan Roth, Alejandro Ribeiro · May 21, 2025 · Citations: 0

Long Horizon

Open arXiv Find Implementation RSS feed Shortlist (0)

How to use this page

Low trust

Use this as background context only. Do not make protocol decisions from this page alone.

Best use

Background context only

What to verify

Read the full paper before copying any benchmark, metric, or protocol choices.

Evidence quality

Low

Derived from extracted protocol signals and abstract evidence.

Abstract

Knowledge extrapolation is the process of inferring novel information by combining and extending existing knowledge that is explicitly available. It is essential for solving complex questions in specialized domains where retrieving comprehensive external knowledge is impractical. We propose SAKE (Structured Agentic Knowledge Extrapolation), a RL powered agentic framework that trains LLMs to autonomously retrieve and extrapolate structured knowledge through tool-augmented reinforcement learning. SAKE defines two external KG tools: entity group construction and cross-group triplet retrieval. The model learns to interleave these 2 retrieval tools during a three-turn rollout: extracting key entities, filtering relevant concept groups, and associative reasoning by constructing new triplets through analogy. The entire pipeline is optimized end-to-end with GRPO using a curriculum reward, teaching the model what to retrieve and how to reason over it. Our experiments proved that SAKE fine-tuned Qwen2.5-7B model surpasses GPT-3.5-Turbo with state-of-the-art agentic KG reasoning on both biomedical (75.4% vs. 70.1%) and commonsense (81.3% vs. 74.7%) benchmarks, while reducing token usage by over 90%. These results demonstrate that associative reasoning over incomplete structured knowledge does not requiring large models with complex, multi-step prompting, thus can be learned end-to-end by small, open-weight models through reinforcement learning with the right tools and training signal. Our code is available at https://anonymous.4open.science/r/SAKE-7585.

Abstract-only analysis — low confidence

All signals on this page are inferred from the abstract only and may be inaccurate. Do not use this page as a primary protocol reference.

This paper looks adjacent to evaluation work, but not like a strong protocol reference.
The available metadata is too thin to trust this as a primary source.
The abstract does not clearly name benchmarks or metrics.

Human Eval Hub LLM-as-Judge Hub Pairwise Preference Hub

Should You Rely On This Paper?

This paper is adjacent to HFEPX scope and is best used for background context, not as a primary protocol reference.

Best use

Background context only

Use if you need

A secondary eval reference to pair with stronger protocol papers.

Main weakness

This paper looks adjacent to evaluation work, but not like a strong protocol reference.

Trust level

Low

Usefulness score

0/100 • Low

Treat as adjacent context, not a core eval-method reference.

Human Feedback Signal

Not explicit in abstract metadata

Evaluation Signal

Detected

Usefulness for eval research

Adjacent candidate

Extraction confidence 15%

If you are doing eval pipeline work, start here:

Human Eval Hub LLM-as-Judge Hub Pairwise Preference Hub Tool-Use Eval Hub

What We Could Verify

These are the protocol signals we could actually recover from the available paper metadata. Use them to decide whether this paper is worth deeper reading.