Daily Archive

HFEPX Daily Archive: 2026-02-22

Updated from current HFEPX corpus (Feb 26, 2026). 23 papers are grouped in this daily page. Common evaluation modes: Automatic Metrics, Simulation Env. Frequently cited benchmark: AgentBench. Common metric signal: accuracy. Newest paper in this set is from Feb 22, 2026.

Papers: 23 Last published: Feb 22, 2026 Global RSS

Research Utility Snapshot

Evaluation Modes

Automatic Metrics (21)
Simulation Env (2)

Top Metrics Reported

Accuracy (5)
F1 (2)
Precision (2)
Recall (2)

Papers Published On This Date

Adaptive Data Augmentation with Multi-armed Bandit: Sample-Efficient Embedding Calibration for Implicit Pattern Recognition
Minxue Tang, Yangyang Yu, Aolin Ding, Maziyar Baran Pouyan, Taha Belkhouja Yujia Bao · Feb 22, 2026

Recognizing implicit visual and textual patterns is essential in many real-world applications of modern AI.
PerSoMed: A Large-Scale Balanced Dataset for Persian Social Media Text Classification
Isun Chehreh, Ebrahim Ansari · Feb 22, 2026

Data collection involved 60,000 raw posts from various Persian social media platforms, followed by rigorous preprocessing and hybrid annotation combining ChatGPT-based few-shot prompting with human verification.
Anatomy of Agentic Memory: Taxonomy and Empirical Analysis of Evaluation and System Limitations
Dongming Jiang, Yi Li, Songtao Wei, Jinxin Yang, Ayushi Kishore · Feb 22, 2026

Long Horizon

Agentic memory systems enable large language model (LLM) agents to maintain state across long interactions, supporting long-horizon reasoning and personalization beyond fixed context windows.
Learning to Reason for Multi-Step Retrieval of Personal Context in Personalized Question Answering
Maryam Amirizaniani, Alireza Salemi, Hamed Zamani · Feb 22, 2026

Pairwise Preference Long Horizon

Personalization in Question Answering (QA) requires answers that are both accurate and aligned with users' background, preferences, and historical context.
Retrieval Augmented Enhanced Dual Co-Attention Framework for Target Aware Multimodal Bengali Hateful Meme Detection
Raihan Tanvir, Md. Golam Rabiul Alam · Feb 22, 2026

Hateful content on social media increasingly appears as multimodal memes that combine images and text to convey harmful narratives.
Next Reply Prediction X Dataset: Linguistic Discrepancies in Naively Generated Content
Simon Münker, Nils Schwager, Kai Kugler, Michael Heseltine, Achim Rettinger · Feb 22, 2026

The increasing use of Large Language Models (LLMs) as proxies for human participants in social science research presents a promising, yet methodologically risky, paradigm shift.
TurkicNLP: An NLP Toolkit for Turkic Languages
Sherzod Hakimov · Feb 22, 2026

Natural language processing for the Turkic language family, spoken by over 200 million people across Eurasia, remains fragmented, with most languages lacking unified tooling and resources.
Reasoning Capabilities of Large Language Models. Lessons Learned from General Game Playing
Maciej Świechowski, Adam Żychowski, Jacek Mańdziuk · Feb 22, 2026

The main results indicate that three of the evaluated models generally perform well across most experimental settings, with performance degradation observed as the evaluation horizon increases (i.e., with a higher number of game steps).
Beyond Behavioural Trade-Offs: Mechanistic Tracing of Pain-Pleasure Decisions in an LLM
Francesca Bianco, Derek Shiller · Feb 22, 2026

This work supports a more evidence-driven (a) debate on AI sentience and welfare, and (b) governance when setting policy, auditing standards, and safety safeguards.
Facet-Level Persona Control by Trait-Activated Routing with Contrastive SAE for Role-Playing LLMs
Wenqiu Tang, Zhen Wan, Takahiro Komamizu, Ichiro Ide · Feb 22, 2026

Personality control in Role-Playing Agents (RPAs) is commonly achieved via training-free methods that inject persona descriptions and memory through prompts or retrieval-augmented generation, or via supervised fine-tuning (SFT) on persona-s
VIGiA: Instructional Video Guidance via Dialogue Reasoning and Retrieval
Diogo Glória-Silva, David Semedo, João Maglhães · Feb 22, 2026

Long Horizon

Our evaluation shows that VIGiA outperforms existing state-of-the-art models on all tasks in a conversational plan guidance setting, reaching over 90\% accuracy on plan-aware VQA.
A Dataset for Named Entity Recognition and Relation Extraction from Art-historical Image Descriptions
Stefanie Schneider, Miriam Göldl, Julian Stalter, Ricarda Vollmer · Feb 22, 2026

The dataset is released as UIMA XMI Common Analysis Structure (CAS) files with accompanying images and bibliographic metadata, and can be used to benchmark and fine-tune NER and RE systems, including zero- and few-shot setups with Large Lan
AgenticRAGTracer: A Hop-Aware Benchmark for Diagnosing Multi-Step Retrieval Reasoning in Agentic RAG
Qijie You, Wenkai Yu, Wentao Zhang · Feb 22, 2026

Long Horizon

With the rapid advancement of agent-based methods in recent years, Agentic RAG has undoubtedly become an important research direction.
How Do LLMs Encode Scientific Quality? An Empirical Study Using Monosemantic Features from Sparse Autoencoders
Michael McCoubrey, Angelo Salatino, Francesco Osborne, Enrico Motta · Feb 22, 2026

In recent years, there has been a growing use of generative AI, and large language models (LLMs) in particular, to support both the assessment and generation of scientific work.
Astra: Activation-Space Tail-Eigenvector Low-Rank Adaptation of Large Language Models
Kainan Liu, Yong Zhang, Ning Cheng, Yun Zhu, Yanmeng Wang · Feb 22, 2026

Extensive experiments across natural language understanding (NLU) and natural language generation (NLG) tasks demonstrate that Astra consistently outperforms existing PEFT baselines across 16 benchmarks and even surpasses full fine-tuning (
Value Entanglement: Conflation Between Different Kinds of Good In (Some) Large Language Models
Seong Hah Cho, Junyi Li, Anna Leshinskaya · Feb 22, 2026

Among the characteristics of value representation in humans is that they distinguish among value of different kinds.
TriTopic: Tri-Modal Graph-Based Topic Modeling with Iterative Refinement and Archetypes
Roman Egger · Feb 22, 2026

In benchmarks across 20 Newsgroups, BBC News, AG News, and Arxiv, TriTopic achieves the highest NMI on every dataset (mean NMI 0.575 vs.
Do LLMs and VLMs Share Neurons for Inference? Evidence and Mechanisms of Cross-Modal Transfer
Chenhang Cui, An Zhang, Yuxin Chen, Gelei Deng, Jingnan Zheng · Feb 22, 2026

Long Horizon

Across diverse mathematics and perception benchmarks, SNRF consistently enhances LVLM inference performance while preserving perceptual capabilities.
IAPO: Information-Aware Policy Optimization for Token-Efficient Reasoning
Yinhan He, Yaochen Zhu, Mingjia Shi, Wendy Zheng, Lin Su · Feb 22, 2026

Extensive empirical evaluations demonstrate that information-aware advantage shaping is a powerful and general direction for token-efficient post-training.
Uncovering Context Reliance in Unstructured Knowledge Editing
Zisheng Zhou, Mengqi Zhang, Shiguang Wu, Xiaotian Ye, Chi Zhang · Feb 22, 2026

Evaluations show that COIN reduces Context Reliance by 45.2% and outperforms strong baselines by 23.6% in editing success rate, highlighting the vital role of mitigating Context Reliance for robust editing.
Learning to Detect Language Model Training Data via Active Reconstruction
Junjie Oscar Yin, John X. Morris, Vitaly Shmatikov, Sewon Min, Hannaneh Hajishirzi · Feb 22, 2026

Detecting LLM training data is generally framed as a membership inference attack (MIA) problem.
Capable but Unreliable: Canonical Path Deviation as a Causal Mechanism of Agent Failure in Long-Horizon Tasks
Wilson Y. Lee · Feb 22, 2026

Long Horizon

Why do language agents fail on tasks they are capable of solving?
Benchmark Test-Time Scaling of General LLM Agents
Xiaochuan Li, Ryan Ming, Pranav Setlur, Abhijay Paladugu, Andy Tang · Feb 22, 2026

LLM agents are increasingly expected to function as general-purpose systems capable of resolving open-ended user requests.

HFEPX Daily Archive: 2026-02-22

Research Utility Snapshot

Papers Published On This Date

Recent Daily Archives