HFEPX Archive Slice

HFEPX Daily Archive: 2026-02-26

Updated from current HFEPX corpus (Apr 12, 2026). 154 papers are grouped in this daily page.

Read Full Context

Updated from current HFEPX corpus (Apr 12, 2026). 154 papers are grouped in this daily page. Common evaluation modes: Automatic Metrics, Simulation Env. Most common rater population: Domain Experts. Common annotation unit: Trajectory. Frequent quality control: Adjudication. Frequently cited benchmark: BrowseComp. Common metric signal: accuracy. Use this page to compare protocol setup, judge behavior, and labeling design decisions before running new eval experiments. Newest paper in this set is from Feb 26, 2026.

Papers: 154 Last published: Feb 26, 2026 Global RSS

Researcher Quick Triage

Use this archive page for time-slice monitoring (what changed in evaluation methods, metrics, and protocol quality this period). Quality band: High .

Analysis blocks are computed from the loaded sample (60 of 154 papers).

High-Signal Coverage

100.0%

60 / 60 papers are not low-signal flagged.

Benchmark Anchors

8.3%

Papers with benchmark/dataset mentions in extraction output.

Metric Anchors

16.7%

Papers with reported metric mentions in extraction output.

0 papers report explicit quality controls for this archive period.
Prioritize papers with both benchmark and metric anchors for reliable longitudinal comparisons.

Primary action: Use this slice as early signal only; benchmark/metric anchoring is limited for rigorous period-over-period claims.

Get this digest every Friday →

Why This Time Slice Matters

13% of papers report explicit human-feedback signals, led by pairwise preferences.
automatic metrics appears in 23.4% of papers in this hub.
BrowseComp is a recurring benchmark anchor for cross-paper comparisons in this page.

Protocol Takeaways For This Period

Most common quality-control signal is adjudication (0.6% of papers).
Rater context is mostly domain experts, and annotation is commonly trajectory-level annotation; use this to scope replication staffing.
Pair this hub with llm_as_judge pages to benchmark automated-vs-human evaluation tradeoffs.

Start Here (Highest-Signal Papers In This Slice)

Ranked by protocol completeness and evidence density for faster period-over-period review.

RLShield: Practical Multi-Agent RL for Financial Cyber Defense with Attack-Surface MDPs and Real-Time Response Orchestration
Feb 26, 2026 · Citations: 0 · Score: 6.0

Eval: Automatic Metrics · Metrics: Cost
SPARTA: Scalable and Principled Benchmark of Tree-Structured Multi-hop QA over Text and Tables
Feb 26, 2026 · Citations: 0 · Score: 6.0

Eval: Automatic Metrics · Metrics: F1
InnerQ: Hardware-aware Tuning-free Quantization of KV Cache for Large Language Models
Feb 26, 2026 · Citations: 0 · Score: 6.0

Eval: Automatic Metrics · Metrics: Accuracy, Precision
IDP Accelerator: Agentic Document Intelligence from Extraction to Compliance Validation
Feb 26, 2026 · Citations: 0 · Score: 5.5

Eval: Automatic Metrics · Metrics: Accuracy, Latency
CiteAudit: You Cited It, But Did You Read It? A Benchmark for Verifying Scientific References in the LLM Era
Feb 26, 2026 · Citations: 0 · Score: 4.5

Eval: Automatic Metrics · Metrics: Accuracy, Faithfulness
A Mixture-of-Experts Model for Multimodal Emotion Recognition in Conversations
Feb 26, 2026 · Citations: 0 · Score: 4.5

Eval: Automatic Metrics · Metrics: F1, F1 weighted

Protocol Matrix (Top 10)

Quickly compare method ingredients across this archive slice.

Paper	Eval Modes	Benchmarks	Metrics	Quality Controls
RLShield: Practical Multi-Agent RL for Financial Cyber Defense with Attack-Surface MDPs and Real-Time Response Orchestration Feb 26, 2026	Automatic Metrics	APPS	Cost	Not reported
SPARTA: Scalable and Principled Benchmark of Tree-Structured Multi-hop QA over Text and Tables Feb 26, 2026	Automatic Metrics	DROP	F1	Not reported
InnerQ: Hardware-aware Tuning-free Quantization of KV Cache for Large Language Models Feb 26, 2026	Automatic Metrics	GSM8K	Accuracy, Precision	Not reported
IDP Accelerator: Agentic Document Intelligence from Extraction to Compliance Validation Feb 26, 2026	Automatic Metrics	Not reported	Accuracy, Latency	Not reported
CiteAudit: You Cited It, But Did You Read It? A Benchmark for Verifying Scientific References in the LLM Era Feb 26, 2026	Automatic Metrics	Not reported	Accuracy, Faithfulness	Not reported
A Mixture-of-Experts Model for Multimodal Emotion Recognition in Conversations Feb 26, 2026	Automatic Metrics	Not reported	F1, F1 weighted	Not reported
Discourse-Aware Dual-Track Streaming Response for Low-Latency Spoken Dialogue Systems Feb 26, 2026	Automatic Metrics	Not reported	Latency, Jailbreak success rate	Not reported
AgentDropoutV2: Optimizing Information Flow in Multi-Agent Systems via Test-Time Rectify-or-Reject Pruning Feb 26, 2026	Automatic Metrics	Not reported	Accuracy	Not reported
Spatio-Temporal Token Pruning for Efficient High-Resolution GUI Agents Feb 26, 2026	Automatic Metrics	Not reported	Precision, Latency	Not reported
ReCoN-Ipsundrum: An Inspectable Recurrent Persistence Loop Agent with Affect-Coupled Control and Mechanism-Linked Consciousness Indicator Assays Feb 26, 2026	Not reported	DROP	Not reported	Not reported

Researcher Workflow (Detailed)

Checklist

Gap: Papers with explicit human feedback

Coverage is a replication risk (13% vs 45% target).
Gap: Papers reporting quality controls

Coverage is a replication risk (1.9% vs 30% target).
Gap: Papers naming benchmarks/datasets

Coverage is a replication risk (9.1% vs 35% target).
Moderate: Papers naming evaluation metrics

Coverage is usable but incomplete (23.4% vs 35% target).
Gap: Papers with known rater population

Coverage is a replication risk (8.4% vs 35% target).
Gap: Papers with known annotation unit

Coverage is a replication risk (7.1% vs 35% target).

Strengths

This hub still surfaces a concentrated paper set for protocol triage and replication planning.

Known Gaps

Only 1.9% of papers report quality controls; prioritize calibration/adjudication evidence.
Rater population is under-specified (8.4% coverage).
Annotation unit is under-specified (7.1% coverage).

Suggested Next Analyses

Pair this hub with llm_as_judge pages to benchmark automated-vs-human evaluation tradeoffs.
Stratify by benchmark (BrowseComp vs GAIA) before comparing methods.
Track metric sensitivity by reporting both accuracy and cost.

Recommended Queries

Human Eval Protocols Benchmark Slice: BrowseComp Metric Slice: accuracy IAA-Reported Evaluations Recent High-Signal Papers

Known Limitations

Only 1.9% of papers report quality controls; prioritize calibration/adjudication evidence.
Rater population is under-specified (8.4% coverage).
Narrative synthesis is grounded in metadata and abstracts only; full-paper implementation details are not parsed.

Research Utility Snapshot (Detailed)

Evaluation Modes

Automatic Metrics (36)
Simulation Env (7)
Human Eval (1)

Top Metrics

Accuracy (20)
Cost (9)
Latency (5)
Precision (4)

Top Benchmarks

BrowseComp (2)
GAIA (2)
WebShop (2)
ALFWorld (1)

Quality Controls

Adjudication (1)
Calibration (1)
Inter Annotator Agreement Reported (1)

Papers In This Archive Slice

RLShield: Practical Multi-Agent RL for Financial Cyber Defense with Attack-Surface MDPs and Real-Time Response Orchestration
Srikumar Nayak · Feb 26, 2026 · Citations: 0

Multi Agent

This paper proposes RLShield, a practical multi-agent RL pipeline for financial cyber defense.
France or Spain or Germany or France: A Neural Account of Non-Redundant Redundant Disjunctions
Sasha Boguraev, Qing Yao, Kyle Mahowald · Feb 26, 2026 · Citations: 0
Humans and LLMs Diverge on Probabilistic Inferences
Gaurav Kamath, Sreenath Madathil, Sebastian Schuster, Marie-Catherine de Marneffe, Siva Reddy · Feb 26, 2026 · Citations: 0
TaCarla: A comprehensive benchmarking dataset for end-to-end autonomous driving
Tugrul Gorgulu, Atakan Dag, M. Esat Kalfaoglu, Halil Ibrahim Kuru, Baris Can Cam · Feb 26, 2026 · Citations: 0

In addition, many real datasets struggle to evaluate their models, especially for planning tasks, since they lack a proper closed-loop evaluation setup.
IDP Accelerator: Agentic Document Intelligence from Extraction to Compliance Validation
Md Mofijul Islam, Md Sirajus Salekin, Joe King, Priyashree Roy, Vamsi Thilak Gudi · Feb 26, 2026 · Citations: 0

Demonstrations

We present IDP (Intelligent Document Processing) Accelerator, a framework enabling agentic AI for end-to-end document intelligence with four key components: (1) DocSplit, a novel benchmark dataset and multimodal classifier using BIO tagging…
FHIRPath-QA: Executable Question Answering over FHIR Electronic Health Records
Michael Frew, Nishit Bheda, Bryan Tripp · Feb 26, 2026 · Citations: 0

Expert Verification

In this work, we introduce FHIRPath-QA, the first open dataset and benchmark for patient-specific QA that includes open-standard FHIRPath queries over real-world clinical data.
CiteAudit: You Cited It, But Did You Read It? A Benchmark for Verifying Scientific References in the LLM Era
Zhengqing Yuan, Kaiwen Shi, Zheyuan Zhang, Lichao Sun, Nitesh V. Chawla · Feb 26, 2026 · Citations: 0

Multi Agent

Meanwhile, rapidly growing reference lists make manual verification impractical, and existing automated tools remain fragile to noisy and heterogeneous citation formats and lack standardized evaluation.
Truncated Step-Level Sampling with Process Rewards for Retrieval-Augmented Reasoning
Chris Samarinas, Haw-Shiuan Chang, Hamed Zamani · Feb 26, 2026 · Citations: 0

Long Horizon

Second, dense, decomposed process rewards separately evaluate reasoning quality, query quality, and answer correctness on a ternary scale via an LLM judge, providing richer supervision than binary outcome signals or heuristic step-level…
Model Agreement via Anchoring
Eric Eaton, Surbhi Goel, Marcel Hussing, Michael Kearns, Aaron Roth · Feb 26, 2026 · Citations: 0

Abstract shows limited direct human-feedback or evaluation-protocol detail; use as adjacent methodological context.
SeeThrough3D: Occlusion Aware 3D Control in Text-to-Image Generation
Vaibhav Agrawal, Rishubh Parihar, Pradhaan Bhat, Ravi Kiran Sarvadevabhatla, R. Venkatesh Babu · Feb 26, 2026 · Citations: 0

Abstract shows limited direct human-feedback or evaluation-protocol detail; use as adjacent methodological context.
SOTAlign: Semi-Supervised Alignment of Unimodal Vision and Language Models via Optimal Transport
Simon Roschmann, Paul Krzakala, Sonia Mazelet, Quentin Bouniot, Zeynep Akata · Feb 26, 2026 · Citations: 0

Abstract shows limited direct human-feedback or evaluation-protocol detail; use as adjacent methodological context.
EvoX: Meta-Evolution for Automated Discovery
Shu Liu, Shubham Agarwal, Monishwaran Maheswaran, Mert Cemri, Zhifei Li · Feb 26, 2026 · Citations: 0
Scale Can't Overcome Pragmatics: The Impact of Reporting Bias on Vision-Language Reasoning
Amita Kamath, Jack Hessel, Khyathi Chandu, Jena D. Hwang, Kai-Wei Chang · Feb 26, 2026 · Citations: 0

With a set of curated benchmarks, we demonstrate that: (i) VLMs perform poorly on the aforementioned types of reasoning suppressed in the training data by reporting bias; (ii) contrary to popular belief, scaling data size, model size, and…
FlashOptim: Optimizers for Memory Efficient Training
Jose Javier Gonzalez Ortiz, Abhay Gupta, Chris Renard, Davis Blalock · Feb 26, 2026 · Citations: 0
Understanding Usage and Engagement in AI-Powered Scientific Research Tools: The Asta Interaction Dataset
Dany Haddad, Dan Bareket, Joseph Chee Chang, Jay DeYoung, Jena D. Hwang · Feb 26, 2026 · Citations: 0
Bitwise Systolic Array Architecture for Runtime-Reconfigurable Multi-precision Quantized Multiplication on Hardware Accelerators
Yuhao Liu, Salim Ullah, Akash Kumar · Feb 26, 2026 · Citations: 0
Utilizing LLMs for Industrial Process Automation
Salim Fares · Feb 26, 2026 · Citations: 0
Toward Expert Investment Teams:A Multi-Agent LLM System with Fine-Grained Trading Tasks
Kunihiro Miyazaki, Takanobu Kawahara, Stephen Roberts, Stefan Zohren · Feb 26, 2026 · Citations: 0

Pairwise Preference Multi Agent

While mainstream approaches deploy multi-agent systems mimicking analyst and manager roles, they often rely on abstract instructions that overlook the intricacies of real-world workflows, which can lead to degraded inference performance and…
LLM Novice Uplift on Dual-Use, In Silico Biology Tasks
Chen Bo Calvin Zhang, Christina Q. Knight, Nicholas Kruus, Jason Hausenloy, Pedro Medeiros · Feb 26, 2026 · Citations: 0

Large language models (LLMs) perform increasingly well on biology benchmarks, but it remains unclear whether they uplift novice users -- i.e., enable humans to perform better than with internet-only resources.
Generalized Rapid Action Value Estimation in Memory-Constrained Environments
Aloïs Rautureau, Tristan Cazenave, Éric Piette · Feb 26, 2026 · Citations: 0
Invariant Transformation and Resampling based Epistemic-Uncertainty Reduction
Sha Hu · Feb 26, 2026 · Citations: 0
Evaluating Zero-Shot and One-Shot Adaptation of Small Language Models in Leader-Follower Interaction
Rafael R. Baptista, André de Lima Salgado, Ricardo V. Godoy, Marcelo Becker, Thiago Boaventura · Feb 26, 2026 · Citations: 0
The logic of KM belief update is contained in the logic of AGM belief revision
Giacomo Bonanno · Feb 26, 2026 · Citations: 0

Critique Edit

Denoting the latter by \mathcal L_{AGM} and the former by \mathcal L_{KM} we show that every axiom of \mathcal L_{KM} is a theorem of \mathcal L_{AGM}.
A Mixture-of-Experts Model for Multimodal Emotion Recognition in Conversations
Soumya Dutta, Smruthi Balaji, Sriram Ganapathy · Feb 26, 2026 · Citations: 0

Experiments on three benchmark datasets-IEMOCAP, MELD, and MOSI-show that our proposal achieves 70.9%, 69.5%, and 87.9% weighted F1-scores respectively, outperforming several baseline speech-text ERC systems.
Conformalized Neural Networks for Federated Uncertainty Quantification under Dual Heterogeneity
Quang-Huy Nguyen, Jiaqi Wang, Wei-Shinn Ku · Feb 26, 2026 · Citations: 0
SPARTA: Scalable and Principled Benchmark of Tree-Structured Multi-hop QA over Text and Tables
Sungho Park, Jueun Kim, Wook-Shin Han · Feb 26, 2026 · Citations: 0

We present SPARTA, an end-to-end construction framework that automatically generates large-scale Table-Text QA benchmarks with lightweight human validation, requiring only one quarter of the annotation time of HybridQA.
ODEBrain: Continuous-Time EEG Graph for Modeling Dynamic Brain Networks
Haohui Jia, Zheng Chen, Lingwei Zhu, Rikuto Kotoge, Jathurshan Pradeepkumar · Feb 26, 2026 · Citations: 0
CXReasonAgent: Evidence-Grounded Diagnostic Reasoning Agent for Chest X-rays
Hyungyung Lee, Hangyul Yoon, Edward Choi · Feb 26, 2026 · Citations: 0
Evaluating Stochasticity in Deep Research Agents
Haotian Zhai, Elias Stengel-Eskin, Pratik Patil, Liu Leqi · Feb 26, 2026 · Citations: 0
Discourse-Aware Dual-Track Streaming Response for Low-Latency Spoken Dialogue Systems
Siyuan Liu, Jiahui Xu, Feng Jiang, Kuang Wang, Zefeng Zhao · Feb 26, 2026 · Citations: 0

Achieving human-like responsiveness is a critical yet challenging goal for cascaded spoken dialogue systems.
Risk-Aware World Model Predictive Control for Generalizable End-to-End Autonomous Driving
Jiangxin Sun, Feng Xue, Teng Long, Chang Liu, Jian-Fang Hu · Feb 26, 2026 · Citations: 0

Demonstrations

Practically, RaWMPC leverages a world model to predict the consequences of multiple candidate actions and selects low-risk actions through explicit risk evaluation.
AgentDropoutV2: Optimizing Information Flow in Multi-Agent Systems via Test-Time Rectify-or-Reject Pruning
Yutong Wang, Siyuan Xiong, Xuebo Liu, Wenkang Zhou, Liang Ding · Feb 26, 2026 · Citations: 0

Multi Agent

We propose AgentDropoutV2, a test-time rectify-or-reject pruning framework designed to dynamically optimize MAS information flow without retraining.
Mitigating Legibility Tax with Decoupled Prover-Verifier Games
Yegon Kim, Juho Lee · Feb 26, 2026 · Citations: 0
A Model-Free Universal AI
Yegon Kim, Juho Lee · Feb 26, 2026 · Citations: 0
Agency and Architectural Limits: Why Optimization-Based Systems Cannot Be Norm-Responsive
Radha Sarma · Feb 26, 2026 · Citations: 0

This paper demonstrates that assumption is formally invalid for optimization-based systems, specifically Large Language Models trained via Reinforcement Learning from Human Feedback (RLHF).
Spatio-Temporal Token Pruning for Efficient High-Resolution GUI Agents
Zhou Xu, Bowen Zhou, Qi Wang, Shuwen Feng, Jingyu Xiao · Feb 26, 2026 · Citations: 0

Web Browsing

Pure-vision GUI agents provide universal interaction capabilities but suffer from severe efficiency bottlenecks due to the massive spatiotemporal redundancy inherent in high-resolution screenshots and historical trajectories.
Scaling Search Relevance: Augmenting App Store Ranking with LLM-Generated Judgments
Evangelia Christakopoulou, Vivekkumar Patel, Hemanth Velaga, Sandip Gaikwad · Feb 26, 2026 · Citations: 0
ReCoN-Ipsundrum: An Inspectable Recurrent Persistence Loop Agent with Affect-Coupled Control and Mechanism-Linked Consciousness Indicator Assays
Aishik Sanyal · Feb 26, 2026 · Citations: 0

Pairwise Preference

Inspired by Humphrey's ipsundrum hypothesis, we implement ReCoN-Ipsundrum, an inspectable agent that extends a ReCoN state machine with a recurrent persistence loop over sensory salience Ns and an optional affect proxy reporting…
MovieTeller: Tool-augmented Movie Synopsis with ID Consistent Progressive Abstraction
Yizhi Li, Xiaohan Chen, Miao Jiang, Wentao Tang, Gaoang Wang · Feb 26, 2026 · Citations: 0
Why Diffusion Language Models Struggle with Truly Parallel (Non-Autoregressive) Decoding?
Pengxiang Li, Dilxat Muhtar, Tianlong Chen, Lu Yin, Shiwei Liu · Feb 26, 2026 · Citations: 0

Across math reasoning benchmarks, NAP yields stronger performance under parallel decoding than DLMs trained on standard long CoT data, with gains growing as parallelism increases.
ColoDiff: Integrating Dynamic Consistency With Content Awareness for Colonoscopy Video Generation
Junhu Fu, Shuyu Liang, Wutong Li, Chen Ma, Peng Huang · Feb 26, 2026 · Citations: 0
InnerQ: Hardware-aware Tuning-free Quantization of KV Cache for Large Language Models
Sayed Mohammadreza Tayaranian Hosseini, Amir Ardakani, Warren J. Gross · Feb 26, 2026 · Citations: 0

Our evaluation experiments on Llama models shows that InnerQ maintains a few-shot GSM8K performance comparable to non-quantized KV caches and surpasses prior KV cache quantization methods.
SC-Arena: A Natural Language Benchmark for Single-Cell Reasoning with Knowledge-Augmented Evaluation
Jiahao Zhao, Feng Jiang, Shaowei Qin, Zhonghui Zhang, Junhao Liu · Feb 26, 2026 · Citations: 0
Fine-Tuning Without Forgetting In-Context Learning: A Theoretical Analysis of Linear Attention Models
Chungpa Lee, Jy-yong Sohn, Kangwook Lee · Feb 26, 2026 · Citations: 0

Demonstrations

We show that fine-tuning all attention parameters can harm in-context learning, whereas restricting updates to the value matrix improves zero-shot performance while preserving in-context learning.
ESAA: Event Sourcing for Autonomous Agents in LLM-Based Software Engineering
Elzo Brito dos Santos Filho · Feb 26, 2026 · Citations: 0
MTRAG-UN: A Benchmark for Open Challenges in Multi-Turn RAG Conversations
Sara Rosenthal, Yannis Katsis, Vraj Shah, Lihong He, Lucian Popa · Feb 26, 2026 · Citations: 0

We present MTRAG-UN, a benchmark for exploring open challenges in multi-turn retrieval augmented generation, a popular use of large language models.
Latent Gaussian Splatting for 4D Panoptic Occupancy Tracking
Maximilian Luz, Rohit Mohan, Thomas Nürnberg, Yakov Miron, Daniele Cattaneo · Feb 26, 2026 · Citations: 0
A Decision-Theoretic Formalisation of Steganography With Applications to LLM Monitoring
Usman Anwar, Julianna Piskorz, David D. Baek, David Africa, Jim Weatherall · Feb 26, 2026 · Citations: 0

Our central insight is that steganography creates an asymmetry in usable information between agents who can and cannot decode the hidden content (present within a steganographic signal), and this otherwise latent asymmetry can be inferred…
PATRA: Pattern-Aware Alignment and Balanced Reasoning for Time Series Question Answering
Junkai Lu, Peng Chen, Xingjian Wu, Yang Shu, Chenjuan Guo · Feb 26, 2026 · Citations: 0
Efficient Encoder-Free Fourier-based 3D Large Multimodal Model
Guofeng Mei, Wei Lin, Luigi Riz, Yujiao Wu, Yiming Wang · Feb 26, 2026 · Citations: 0
The Trinity of Consistency as a Defining Principle for General World Models
Jingxuan Wei, Siyuan Li, Yuhang Xu, Zheng Sun, Junjie Jiang · Feb 26, 2026 · Citations: 0

Long Horizon

To complement this conceptual framework, we introduce CoW-Bench, a benchmark centered on multi-frame reasoning and generation scenarios.
On Sample-Efficient Generalized Planning via Learned Transition Models
Nitin Gupta, Vishal Pallagani, John A. Aydin, Biplav Srivastava · Feb 26, 2026 · Citations: 0
Modality Collapse as Mismatched Decoding: Information-Theoretic Limits of Multimodal LLMs
Jayadev Billa · Feb 26, 2026 · Citations: 0

Abstract shows limited direct human-feedback or evaluation-protocol detail; use as adjacent methodological context.
DyGnROLE: Modeling Asymmetry in Dynamic Graphs with Node-Role-Oriented Latent Encoding
Tyler Bonnet, Marek Rei · Feb 26, 2026 · Citations: 0
SvfEye: A Semantic-Visual Fusion Framework with Multi-Scale Visual Context for Multimodal Reasoning
Yuxiang Shen, Hailong Huang, Zhenkun Gao, Xueheng Li, Man Zhou · Feb 26, 2026 · Citations: 0
Multi-Agent Large Language Model Based Emotional Detoxification Through Personalized Intensity Control for Consumer Protection
Keito Inoshita · Feb 26, 2026 · Citations: 0
Automated Vulnerability Detection in Source Code Using Deep Representation Learning
C. Seas, G. Fitzpatrick, J. A. Hamilton, M. C. Carlisle · Feb 26, 2026 · Citations: 0
Devling into Adversarial Transferability on Image Classification: Review, Benchmark, and Evaluation
Xiaosen Wang, Zhijin Ge, Bohan Liu, Zheng Fang, Fengfan Zhou · Feb 26, 2026 · Citations: 0
Three AI-agents walk into a bar . . . . `Lord of the Flies' tribalism emerges among smart AI-Agents
Dhwanil M. Mori, Neil F. Johnson · Feb 26, 2026 · Citations: 0

Near-future infrastructure systems may be controlled by autonomous AI agents that repeatedly request access to limited resources such as energy, bandwidth, or computing power.
Enhancing CVRP Solver through LLM-driven Automatic Heuristic Design
Zhuoliang Xie, Fei Liu, Zhenkun Wang, Qingfu Zhang · Feb 26, 2026 · Citations: 0

Get Started

Join the #1 Platform for AI Training Talent

Where top AI builders and expert AI Trainers connect to build the future of AI.

Self-Service

Post a Job

Post your project and get a shortlist of qualified AI Trainers and Data Labelers. Hire and manage your team in the tools you already use.

Create Account & Post a Job

Managed Service

For Large Projects

Done-for-You

We recruit, onboard, and manage a dedicated team inside your tools. End-to-end operations for large or complex projects.

Learn About Managed Service

For Freelancers

Join as an AI Trainer

Find AI training and data labeling projects across platforms, all in one place. One profile, one application process, more opportunities.

Join Now