LLMs as Repositories of Factual Knowledge: Limitations and Solutions

Q: How reproducible is "LLMs as Repositories of Factual Knowledge: Limitations and Solutions"?

Estimated time to first reproduction: a few days. Risk flags: Adjacent implementations are not paper-verified. No maintained paper-verified implementation was found; start with the closest related repositories below.

Seyed Mahed Mousavi, Simone Alghisi, Giuseppe Riccardi

Published: Jan 22, 2025

No direct implementation yet

Evidence: Adjacent

Domain fit: AI-core

Verified repos: 0

Core AI workload signals detected from paper context and implementation/artifact evidence.

Time to first repro: a few days

1 risk flag

arXiv PDF

LLMs' sources of knowledge are data snapshots containing factual information about entities collected at different timestamps and from different media types (e.g. wikis, social media, etc.). Such unstructured knowledge is subject to change due to updates through time from past to present. Equally important are the inconsistencies and inaccuracies occurring in different information sources. Consequently, the model's k ...

Read full abstract

nowledge about an entity may be perturbed while training over the sequence of snapshots or at inference time, resulting in inconsistent and inaccurate model performance. In this work, we study the appropriateness of Large Language Models (LLMs) as repositories of factual knowledge. We consider twenty-four state-of-the-art LLMs that are either closed-, partially (weights), or fully (weight and training data) open-source. We evaluate their reliability in responding to time-sensitive factual questions in terms of accuracy and consistency when prompts are perturbed. We further evaluate the effectiveness of state-of-the-art methods to improve LLMs' accuracy and consistency. We then propose ENtity-Aware Fine-tuning (ENAF), a soft neurosymbolic approach aimed at providing structured representation of entities during fine-tuning to reduce inconsistencies and improve response stability under prompt variations.

Technical details

Canonical key: arxiv-2501.12774

Cache status: Stale (SWR served)

Generated at: Jun 18, 2026, 10:52 AM

Artifact coverage: sparse

HF provider: ok (token)

PWC source used: No

LLM status: not_generated

LLM model: n/a

LLM generated: Unknown

LLM content type: n/a

HF policy: hf-relevance-v27

context only

Benchmarks: thin evidence

Time to repro: a few days

1 risk flag

Results & Benchmarks

Freshness tier: cold

Direct + Inferred Evidence

Language modeling

130 time-sensitive facts in DyKnow

Accuracy

130

Source: paper fulltext

Benchmark evidence drill-down

1 findings

Audit each benchmark finding before selecting an implementation path. Evidence refs map to the disclosure section below.

Task	Dataset	Metric	Value	Source	Evidence refs
Language modeling	130 time-sensitive facts in DyKnow	Accuracy	130	paper-derived	No explicit refs

LLMs' sources of knowledge are data snapshots containing factual information about entities collected at different timestamps and from different media types (e.g.

Implementation Evidence Summary

Confidence: medium

OpenSPG/KAG is the closest maintained adjacent implementation (Title overlap with paper keywords (67%)). It is not paper-verified; validate algorithm and evaluation setup against the paper before trusting reported metrics. Community adoption signal: 8834 GitHub stars.

Reproduction Risks

Adjacent implementations are not paper-verified
Recommended repository is adjacent and not paper-verified.

Hardware Notes

Expect multi-day setup/compute for meaningful reproduction based on current guidance.

Evidence disclosure

Evidence graph: 3 refs, 3 links.

Utility signals: depth 100/100, grounding 85/100, status high.

Implementation Status

No verified maintained repo

There is no verified maintained implementation yet. Use this baseline plan to decide whether to prototype now or defer.

No maintained paper-verified implementation was found; start with the closest related repositories below.
Compare repo methods against the paper equations/algorithm before trusting metrics.
Create a minimal baseline implementation from the paper and use adjacent repos as references.

Time to first repro: a few days

Reproduction readiness

No Repo

Time to first repro: days

Last checked: Jun 18, 2026

Hardware requirements

Expect multi-day setup/compute for meaningful reproduction based on current guidance.

No verified implementation available

· No maintained repository has been identified for this paper. Check adjacent implementations or HF artifacts below.

Closest related implementations

These are not paper-verified. Use them as reference points when no direct implementation is available.

OpenSPG/KAG

Adjacent

Confidence: Medium

Stars: 8,834

Title overlap with paper keywords (67%)

Hugging Face artifacts

No trustworthy direct or curated related Hugging Face artifacts were found yet.

Continue with targeted Hugging Face searches derived from the paper title and method context:

Models

arxiv:2501.12774 LLMs Natural Language Processing

Datasets

arxiv:2501.12774 LLMs dataset

Spaces

arxiv:2501.12774 LLMs demo

Tip: start with models, then check datasets/spaces if you need evaluation data or demos.

Direct artifact matches are currently sparse. Use targeted Hugging Face searches to quickly locate candidate models, datasets, and demos.

Search models Search datasets Search spaces

Research context

Tasks

Language modeling

Methods

Transformer

Domains

Natural Language Processing

Evaluation & Human Feedback Data

Open this paper in HFEPX to review benchmark signals, evaluation modes, and human-feedback protocol context.

Open in HFEPX

Explore Similar Papers

Jump to Paper2Code search queries derived from this paper's research context.

Language modeling Transformer Natural Language Processing

Need human evaluators for your AI research? Scale annotation with expert AI Trainers.

Post a Job Get a Quote