- OGD4All: A Framework for Accessible Interaction with Geospatial Open Government Data Based on Large Language Models
Michael Siebenmann, Javier Argota Sánchez-Vaquerizo, Stefan Arisona, Krystian Samp, Luis Gisler · Nov 30, 2025
The system combines semantic data retrieval, agentic reasoning for iterative code generation, and secure sandboxed execution that produces verifiable multimodal outputs.
- PEFT-Bench: A Parameter-Efficient Fine-Tuning Methods Benchmark
Robert Belanec, Branislav Pecher, Ivan Srba, Maria Bielikova · Nov 26, 2025
Despite the advances in PEFT methods, current evaluations remain limited (in terms of evaluated models and datasets) and difficult to reproduce.
- The Metaphysics We Train: A Heideggerian Reading of Machine Learning
Heman Shakeri · Nov 25, 2025
Third, AI's lack of existential structure, specifically the absence of Care (Sorge), is genuinely explanatory: it illuminates why AI systems have no internal resources for questioning their own optimization imperatives, and why they optimiz
- Stabilizing Off-Policy Training for Long-Horizon LLM Agent via Turn-Level Importance Sampling and Clipping-Triggered Normalization
Chenliang Li, Adel Elmahdy, Alex Boyd, Zhongruo Wang, Siliang Zeng · Nov 25, 2025
Long Horizon
Reinforcement learning (RL) algorithms such as PPO and GRPO are widely used to train large language models (LLMs) for multi-turn agentic tasks.
- CDLM: Consistency Diffusion Language Models For Faster Sampling
Minseo Kim, Chenfeng Xu, Coleman Hooper, Harman Singh, Ben Athiwaratkun · Nov 24, 2025
The full training and evaluation code is available at https://github.com/SqueezeAILab/CDLM.
- Empathetic Cascading Networks: A Multi-Stage Prompting Technique for Reducing Social Biases in Large Language Models
Wangjiaxuan Xin · Nov 24, 2025
This report presents the Empathetic Cascading Networks (ECN) framework, a multi-stage prompting method designed to enhance the empathetic and inclusive capabilities of large language models.
- MUCH: A Multilingual Claim Hallucination Benchmark
Jérémie Dentan, Alexi Canesse, Davide Buscaldi, Aymen Shabou, Sonia Vanier · Nov 21, 2025
We introduce MUCH, the first claim-level UQ benchmark designed for fair and reproducible evaluation of future methods under realistic conditions.
- Bridging Symbolic Control and Neural Reasoning in LLM Agents: Structured Cognitive Loop with a Governance Layer
Myung Ho Kim · Nov 21, 2025
Long Horizon
Large language model agents suffer from fundamental architectural problems: entangled reasoning and execution, memory volatility, and uncontrolled action sequences.
- MoDES: Accelerating Mixture-of-Experts Multimodal Large Language Models via Dynamic Expert Skipping
Yushi Huang, Zining Wang, Zhihang Yuan, Yifu Ding, Ruihao Gong · Nov 19, 2025
Extensive experiments for 3 model series across 13 benchmarks demonstrate that MoDES far outperforms previous approaches.
- From Competition to Coordination: Market Making as a Scalable Framework for Safe and Aligned Multi-Agent LLM Systems
Brendan Gho, Suman Muppavarapu, Afnan Shaik, Tyson Tsay, Atharva Mohan · Nov 18, 2025
Multi Agent
As foundation models are increasingly deployed as interacting agents in multi-agent systems, their collective behavior raises new challenges for trustworthiness, transparency, and accountability.