HFEPX Hub

Coding + Pairwise Preference (Last 120 Days)

Updated from current HFEPX corpus (Apr 17, 2026). 34 papers are grouped in this hub page.

Read Full Context

Updated from current HFEPX corpus (Apr 17, 2026). 34 papers are grouped in this hub page. Common evaluation modes: Automatic Metrics, Human Eval. Most common rater population: Domain Experts. Common annotation unit: Pairwise. Frequent quality control: Calibration. Frequently cited benchmark: APPS. Common metric signal: accuracy. Use this page to compare protocol setup, judge behavior, and labeling design decisions before running new eval experiments. Newest paper in this set is from Mar 25, 2026.

Papers: 34 Last published: Mar 25, 2026 Global RSS Tag RSS

CodingPairwise PreferenceLast 120d

Researcher Quick Triage

This hub is best used for protocol triage and replication planning from abstract-level evidence. Quality band: Developing .

All Sampled Papers (34) Replication-Ready Only (4)

High-Signal Coverage

100.0%

34 / 34 sampled papers are not low-signal flagged.

Replication-Ready Set

Benchmark + metric + eval mode explicitly present.

Judge/Human Comparability

Papers containing both `human_eval` and `llm_as_judge`.

4 papers are replication-ready (benchmark + metric + explicit evaluation mode).
0 papers support judge-vs-human agreement analysis.
1 papers report explicit quality controls (calibration/adjudication/IAA).

Primary action: Start with the top 2 papers in “Start Here”, then validate assumptions in the protocol matrix.

Need evaluators for this research workflow?

Post a Job →

Why This Matters For Eval Research

100% of papers report explicit human-feedback signals, led by pairwise preferences.
automatic metrics appears in 47.1% of papers in this hub.
APPS is a recurring benchmark anchor for cross-paper comparisons in this page.

Protocol Takeaways

Most common quality-control signal is rater calibration (2.9% of papers).
Rater context is mostly domain experts, and annotation is commonly pairwise annotation; use this to scope replication staffing.
Compare papers that report both human_eval and llm_as_judge to quantify judge-human agreement drift.

Benchmark Interpretation

APPS appears in 5.9% of hub papers (2/34); use this cohort for benchmark-matched comparisons.
LiveCodeBench appears in 5.9% of hub papers (2/34); use this cohort for benchmark-matched comparisons.

Metric Interpretation

accuracy is reported in 23.5% of hub papers (8/34); compare with a secondary metric before ranking methods.
cost is reported in 17.6% of hub papers (6/34); compare with a secondary metric before ranking methods.

Researcher Checklist (Expanded)

Researcher Checklist

Strong: Papers with explicit human feedback

Coverage is strong (100% vs 45% target).
Gap: Papers reporting quality controls

Coverage is a replication risk (2.9% vs 30% target).
Moderate: Papers naming benchmarks/datasets

Coverage is usable but incomplete (26.5% vs 35% target).
Strong: Papers naming evaluation metrics

Coverage is strong (50% vs 35% target).
Gap: Papers with known rater population

Coverage is a replication risk (11.8% vs 35% target).
Strong: Papers with known annotation unit

Coverage is strong (35.3% vs 35% target).

Strengths

Strong human-feedback signal (100% of papers).
Contains both human-eval and LLM-as-judge protocols for head-to-head methodology comparison.

Known Gaps

Only 2.9% of papers report quality controls; prioritize calibration/adjudication evidence.
Rater population is under-specified (11.8% coverage).
LLM-as-judge appears without enough inter-annotator agreement reporting.

Suggested Next Analyses

Compare papers that report both human_eval and llm_as_judge to quantify judge-human agreement drift.
Stratify by benchmark (APPS vs LiveCodeBench) before comparing methods.
Track metric sensitivity by reporting both accuracy and cost.
Add inter-annotator agreement checks when reproducing these protocols.

Recommended Queries (Expanded)

Recommended Queries

Judge vs Human Agreement Benchmark Slice: APPS Metric Slice: accuracy Recent High-Signal Papers

Start with These 3

Use these when you need one protocol anchor, one benchmark anchor, and one recent comparison point before reading the wider hub.

Strongest protocol reference

Do Phone-Use Agents Respect Your Privacy?

Highest protocol score with explicit human/eval signal plus APPS.

Strongest benchmark reference

CausalRM: Causal-Theoretic Reward Modeling for RLHF from Observationa…

Harmbench with cost gives a fast comparison anchor.

Strongest recent paper

Step 3.5 Flash: Open Frontier-Level Intelligence with 11B Active Para…

Useful for current practice scanning; published Feb 11, 2026.

Start Here (Best First 6)

Ranked for protocol completeness (human signal, benchmark + metric anchors, quality controls, and judge/human overlap).

Do Phone-Use Agents Respect Your Privacy?
Apr 1, 2026 · Citations: 0 · Score: 8.0

HF: Pairwise Preference · Eval: Automatic Metrics · Benchmark: APPS · Metric: Task success
CausalRM: Causal-Theoretic Reward Modeling for RLHF from Observational User Feedbacks
Mar 19, 2026 · Citations: 0 · Score: 8.0

HF: Pairwise Preference · Eval: Automatic Metrics · Benchmark: Harmbench · Metric: Cost
Step 3.5 Flash: Open Frontier-Level Intelligence with 11B Active Parameters
Feb 11, 2026 · Citations: 0 · Score: 7.5

HF: Pairwise Preference · Eval: Not reported · Benchmark: LiveCodeBench · Metric: Latency
Modeling and Benchmarking Spoken Dialogue Rewards with Modality and Colloquialness
Mar 16, 2026 · Citations: 0 · Score: 7.5

HF: Pairwise Preference · Eval: Automatic Metrics · Benchmark: Esdr Bench · Metric: Accuracy
$V_1$: Unifying Generation and Self-Verification for Parallel Reasoners
Mar 4, 2026 · Citations: 0 · Score: 7.5

HF: Pairwise Preference · Eval: Automatic Metrics · Benchmark: SWE Bench · Metric: Pass@1
RewardUQ: A Unified Framework for Uncertainty-Aware Reward Models
Feb 27, 2026 · Citations: 0 · Score: 7.0

HF: Pairwise Preference · Eval: Automatic Metrics · Benchmark: Not Reported · Metric: Accuracy

Protocol Matrix (Top 12)

Use this to quickly compare protocol ingredients instead of scanning long prose.

Paper	HF Signal	Eval Modes	Benchmarks	Metrics	QC
Do Phone-Use Agents Respect Your Privacy? Apr 1, 2026	Yes Pairwise Preference	Automatic Metrics	APPS , Myphonebench	Task success	Not Reported
CausalRM: Causal-Theoretic Reward Modeling for RLHF from Observational User Feedbacks Mar 19, 2026	Yes Pairwise Preference	Automatic Metrics	Harmbench	Not Reported	Not Reported
Step 3.5 Flash: Open Frontier-Level Intelligence with 11B Active Parameters Feb 11, 2026	Yes Pairwise Preference	Not Reported	LiveCodeBench , BrowseComp	Not Reported	Not Reported
Modeling and Benchmarking Spoken Dialogue Rewards with Modality and Colloquialness Mar 16, 2026	Yes Pairwise Preference	Automatic Metrics	Esdr Bench	Accuracy	Not Reported
$V_1$: Unifying Generation and Self-Verification for Parallel Reasoners Mar 4, 2026	Yes Pairwise Preference	Automatic Metrics	SWE Bench , AIME	Pass@1	Not Reported
RewardUQ: A Unified Framework for Uncertainty-Aware Reward Models Feb 27, 2026	Yes Pairwise Preference	Automatic Metrics	Not Reported	Accuracy	Calibration
VehicleMemBench: An Executable Benchmark for Multi-User Long-Term Memory in In-Vehicle Agents Mar 25, 2026	Yes Pairwise Preference	Simulation Env	Vehiclemembench	Not Reported	Not Reported
IntelliAsk: Learning to Ask High-Quality Research Questions via RLVR Jan 23, 2026	Yes Pairwise Preference , Expert Verification	Human Eval	Writingbench	Not Reported	Not Reported
ChartEditBench: Evaluating Grounded Multi-Turn Chart Editing in Multimodal Language Models Feb 17, 2026	Yes Pairwise Preference	Automatic Metrics	Charteditbench	Not Reported	Not Reported
FEAST: Fully Connected Expressive Attention for Spatial Transcriptomics Mar 26, 2026	Yes Pairwise Preference	Not Reported	Not Reported	Not Reported	Not Reported
IslamicMMLU: A Benchmark for Evaluating LLMs on Islamic Knowledge Mar 24, 2026	Yes Pairwise Preference	Automatic Metrics	Not Reported	Accuracy	Not Reported
From Oracle to Noisy Context: Mitigating Contextual Exposure Bias in Speech-LLMs Mar 25, 2026	Yes Pairwise Preference	Not Reported	Not Reported	Wer , Jailbreak success rate	Not Reported

Protocol Diff (Top Papers)

Fast side-by-side comparison for the highest-ranked papers in this hub.

Signal	Do Phone-Use Agents Respect Your Privacy?	CausalRM: Causal-Theoretic Reward Modeling for RLHF…	Step 3.5 Flash: Open Frontier-Level Intelligence wi…
Human Feedback	Pairwise Preference	Pairwise Preference	Pairwise Preference
Evaluation Modes	Automatic Metrics	Automatic Metrics	Not reported
Benchmarks	APPS, Myphonebench	Harmbench	LiveCodeBench, BrowseComp
Metrics	Task success	Not reported	Not reported
Quality Controls	Not reported	Not reported	Not reported
Rater Population	Unknown	Unknown	Domain Experts
Annotation Unit	Unknown	Unknown	Unknown

Research Utility Snapshot

Human Feedback Mix

Pairwise Preference (34)
Expert Verification (2)
Critique Edit (1)
Rubric Rating (1)

Evaluation Modes

Automatic Metrics (16)
Human Eval (1)
Llm As Judge (1)
Simulation Env (1)

Top Benchmarks

APPS (2)
LiveCodeBench (2)
AIME (1)
BrowseComp (1)

Top Metrics

Accuracy (8)
Cost (6)
Latency (2)
Task success (2)

Rater Population Mix

Domain Experts (3)
Mixed (1)

Quality Controls

Calibration (1)

Coverage diagnostics (sample-based): human-feedback 100.0% · benchmarks 26.5% · metrics 50.0% · quality controls 2.9%.

Top Papers

VehicleMemBench: An Executable Benchmark for Multi-User Long-Term Memory in In-Vehicle Agents
Yuhao Chen, Yi Xu, Xinyun Ding, Xiang Fang, Shuochen Liu · Mar 25, 2026 · Citations: 0

Pairwise Preference Simulation Env Tool Use

With the growing demand for intelligent in-vehicle experiences, vehicle-based agents are evolving from simple assistants to long-term companions.
Step 3.5 Flash: Open Frontier-Level Intelligence with 11B Active Parameters
Ailin Huang, Ang Li, Aobo Kong, Bin Wang, Binxing Jiao · Feb 11, 2026 · Citations: 0

Pairwise Preference Tool Use

We introduce Step 3.5 Flash, a sparse Mixture-of-Experts (MoE) model that bridges frontier-level agentic intelligence and computational efficiency.
Decoupling Strategy and Execution in Task-Focused Dialogue via Goal-Oriented Preference Optimization
Jingyi Xu, Xingyu Ren, Zhoupeng Shou, Yumeng Zhang, Zhiqiang You · Jan 24, 2026 · Citations: 0

Pairwise Preference Automatic Metrics Long Horizon

To address this, we propose Goal-Oriented Preference Optimization (GOPO), a hierarchical reinforcement learning framework that decouples strategy planning from response generation via an Expert Agent and a Customer Service Agent.
IntelliAsk: Learning to Ask High-Quality Research Questions via RLVR
Karun Sharma, Vidushee Vats, Shengzhi Li, Yuxiang Wang, Zhongtian Sun · Jan 23, 2026 · Citations: 0

Pairwise PreferenceExpert Verification Human Eval

Peer review relies on substantive, evidence-based questions, yet current LLMs generate surface-level queries that perform worse than human reviewer questions in expert evaluation.
RewardUQ: A Unified Framework for Uncertainty-Aware Reward Models
Daniel Yang, Samuel Stante, Florian Redhardt, Lena Libon, Parnian Kassraie · Feb 27, 2026 · Citations: 0

Pairwise Preference Automatic Metrics

Reward models are central to aligning large language models (LLMs) with human preferences.
Vibe Code Bench: Evaluating AI Models on End-to-End Web Application Development
Hung Tran, Langston Nashold, Rayan Krishnan, Antoine Bigeard, Alex Gu · Mar 4, 2026 · Citations: 0

Pairwise Preference Automatic Metrics Web Browsing

We introduce Vibe Code Bench, a benchmark of 100 web application specifications (50 public validation, 50 held-out test) with 964 browser-based workflows comprising 10,131 substeps, evaluated against deployed applications by an autonomous…
Modeling and Benchmarking Spoken Dialogue Rewards with Modality and Colloquialness
Jingyu Lu, Yuhan Wang, Fan Zhuo, Xize Cheng, Changhao Pan · Mar 16, 2026 · Citations: 0

Pairwise Preference Automatic Metrics

To address these challenges, we introduce SDiaReward, an end-to-end multi-turn reward model trained on SDiaReward-Dataset, a novel collection of episode-level preference pairs explicitly targeting these gaps.
$V_1$: Unifying Generation and Self-Verification for Parallel Reasoners
Harman Singh, Xiuyu Li, Kusha Sareen, Monishwaran Maheswaran, Sijun Tan · Mar 4, 2026 · Citations: 0

Pairwise Preference Automatic Metrics

On code generation (LiveCodeBench, CodeContests, SWE-Bench) and math reasoning (AIME, HMMT) benchmarks, V_1-Infer improves Pass@1 by up to 10% over pointwise verification and outperforms recent test-time scaling methods while being…
WebCoderBench: Benchmarking Web Application Generation with Comprehensive and Interpretable Evaluation Metrics
Chenxu Liu, Yingjie Fu, Wei Yang, Ying Zhang, Tao Xie · Jan 5, 2026 · Citations: 0

Pairwise Preference Llm As Judge

However, building a benchmark for LLM-generated web apps remains challenging due to the need for real-world user requirements, generalizable evaluation metrics without relying on ground-truth implementations or test cases, and interpretable…
Do Phone-Use Agents Respect Your Privacy?
Zhengyang Tang, Ke Ji, Xidong Wang, Zihan Ye, Xinyuan Wang · Apr 1, 2026 · Citations: 0

Pairwise Preference Automatic Metrics

We study whether phone-use agents respect privacy while completing benign mobile tasks.
CausalRM: Causal-Theoretic Reward Modeling for RLHF from Observational User Feedbacks
Hao Wang, Licheng Pan, Zhichao Chen, Chunyuan Zheng, Zhixuan Chu · Mar 19, 2026 · Citations: 0

Pairwise Preference Automatic Metrics

Despite the success of reinforcement learning from human feedback (RLHF) in aligning language models, current reward modeling heavily relies on experimental feedback data collected from human annotators under controlled and costly…
Sabiá-4 Technical Report
Thiago Laitz, Thales Sales Almeida, Hugo Abonizio, Roseval Malaquias Junior, Giovana Kerche Bonás · Mar 10, 2026 · Citations: 0

Pairwise Preference Automatic Metrics Tool Use

The models were developed through a four-stage training pipeline: continued pre-training on Portuguese and Brazilian legal corpora, long-context extension to 128K tokens, supervised fine-tuning on instruction data spanning chat, code, legal…
PrivAct: Internalizing Contextual Privacy Preservation via Multi-Agent Preference Training
Yuhan Cheng, Hancheng Ye, Hai Helen Li, Jingwei Sun, Yiran Chen · Feb 14, 2026 · Citations: 0

Pairwise Preference Automatic Metrics Multi Agent

We propose PrivAct, a contextual privacy-aware multi-agent learning framework that internalizes contextual privacy preservation directly into models' generation behavior for privacy-compliant agentic actions.
Surgical Post-Training: Cutting Errors, Keeping Knowledge
Wenye Lin, Kai Han · Mar 2, 2026 · Citations: 0

Pairwise Preference Automatic Metrics

While prior research emphasizes the role of on-policy data in mitigating forgetting, we uncover--and validate both theoretically and empirically--an overlooked yet critical mechanism: the implicit regularization inherent in Direct…
The Vision Wormhole: Latent-Space Communication in Heterogeneous Multi-Agent Systems
Xiaoze Liu, Ruowang Zhang, Weichen Yu, Siheng Xiong, Liu He · Feb 17, 2026 · Citations: 0

Pairwise Preference Multi Agent

Multi-Agent Systems (MAS) powered by Large Language Models have unlocked advanced collaborative reasoning, yet they remain shackled by the inefficiency of discrete text communication, which imposes significant runtime overhead and…
ChartEditBench: Evaluating Grounded Multi-Turn Chart Editing in Multimodal Language Models
Manav Nitin Kapadnis, Lawanya Baghel, Atharva Naik, Carolyn Rosé · Feb 17, 2026 · Citations: 0

Pairwise Preference Automatic Metrics

In practice, users iteratively refine visualizations through multi-turn interactions that require maintaining common ground, tracking prior edits, and adapting to evolving preferences.
FEAST: Fully Connected Expressive Attention for Spatial Transcriptomics
Taejin Jeong, Joohyeok Kim, Jinyeong Kim, Chanyoung Kim, Seong Jae Hwang · Mar 26, 2026 · Citations: 0

Pairwise Preference

To address this, we propose FEAST (Fully connected Expressive Attention for Spatial Transcriptomics), an attention-based framework that models the tissue as a fully connected graph, enabling the consideration of all pairwise interactions.
IslamicMMLU: A Benchmark for Evaluating LLMs on Islamic Knowledge
Ali Abdelaal, Mohammed Nader Al Haffar, Mahmoud Fawzi, Walid Magdy · Mar 24, 2026 · Citations: 0

Pairwise Preference Automatic Metrics

We introduce IslamicMMLU, a benchmark of 10,013 multiple-choice questions spanning three tracks: Quran (2,013 questions), Hadith (4,000 questions), and Fiqh (jurisprudence, 4,000 questions).
Truth as a Compression Artifact in Language Model Training
Konstantin Krestnikov · Mar 12, 2026 · Citations: 0

Pairwise Preference Automatic Metrics

In the random-error setting, models strongly prefer correct completions in paired evaluation: 83.1% accuracy at balanced data and 67.0% even when correct rules appear in only 10% of the corpus.
From Prompting to Preference Optimization: A Comparative Study of LLM-based Automated Essay Scoring
Minh Hoang Nguyen, Vu Hoang Pham, Xuan Thanh Huynh, Phuc Hong Mai, Vinh The Nguyen · Mar 6, 2026 · Citations: 0

Pairwise Preference Automatic Metrics

On this unified benchmark, we evaluate four approaches: (i) encoder-based classification fine-tuning, (ii) zero- and few-shot prompting, (iii) instruction tuning and Retrieval-Augmented Generation (RAG), and (iv) Supervised Fine-Tuning…
Bridging the Multilingual Safety Divide: Efficient, Culturally-Aware Alignment for Global South Languages
Somnath Banerjee, Rima Hazra, Animesh Mukherjee · Feb 14, 2026 · Citations: 0

Pairwise Preference Automatic Metrics

Yet safety pipelines, benchmarks, and alignment still largely target English and a handful of high-resource languages, implicitly assuming safety and factuality ''transfer'' across languages.
Tucano 2 Cool: Better Open Source LLMs for Portuguese
Nicholas Kluge Corrêa, Aniket Sen, Shiza Fatimah, Sophia Falk, Lennard Landgraf · Mar 3, 2026 · Citations: 0

Pairwise Preference Tool Use

Following our previous works, we now extend our dataset, GigaVerbo-v2, to a new degree of quality and scale, while also introducing a new synthetic dataset, GigaVerbo-v2 Synth, aimed at filling missing gaps in GigaVerbo-v2, and two…
Rethinking Metrics for Lexical Semantic Change Detection
Roksana Goworek, Haim Dubossarsky · Feb 17, 2026 · Citations: 0

Pairwise Preference Automatic Metrics

Lexical semantic change detection (LSCD) increasingly relies on contextualised language model embeddings, yet most approaches still quantify change using a small set of semantic change metrics, primarily Average Pairwise Distance (APD) and
From Oracle to Noisy Context: Mitigating Contextual Exposure Bias in Speech-LLMs
Xiaoyong Guo, Nanjie Li, Zijie Zeng, Kai Wang, Hao Huang · Mar 25, 2026 · Citations: 0

Pairwise Preference

We propose a unified training framework to improve robustness under realistic histories: (i) Teacher Error Knowledge by using Whisper large-v3 hypotheses as training-time history, (ii) Context Dropout to regularize over-reliance on history,…
Comparing Developer and LLM Biases in Code Evaluation
Aditya Mittal, Ryan Shar, Zichu Wu, Shyam Agarwal, Tongshuang Wu · Mar 25, 2026 · Citations: 0

Pairwise PreferenceRubric Rating

We present TRACE (Tool for Rubric Analysis in Code Evaluation), a framework that evaluates LLM judges' ability to predict human preferences and automatically extracts rubric items to reveal systematic biases in how humans and models weigh…
From Isolated Scoring to Collaborative Ranking: A Comparison-Native Framework for LLM-Based Paper Evaluation
Pujun Zheng, Jiacheng Yao, Jinquan Zheng, Chenyang Gu, Guoxiu He · Mar 18, 2026 · Citations: 0

Pairwise Preference

Large language models (LLMs) are currently applied to scientific paper evaluation by assigning an absolute score to each paper independently.
PrivMedChat: End-to-End Differentially Private RLHF for Medical Dialogue Systems
Sudip Bhujel · Mar 3, 2026 · Citations: 0

Pairwise PreferenceExpert Verification

To avoid costly clinician labeling, we introduce an annotation-free preference construction strategy that pairs physician responses with filtered non-expert generations.
Why Code, Why Now: Learnability, Computability, and the Real Limits of Machine Learning
Zhimin Zhao · Feb 15, 2026 · Citations: 0

Pairwise Preference

We propose a five-level hierarchy of learnability based on information structure and argue that the ceiling on ML progress depends less on model size than on whether a task is learnable at all.
You Didn't Have to Say It like That: Subliminal Learning from Faithful Paraphrases
Isaia Gisler, Zhonghao He, Tianyi Qiu · Mar 10, 2026 · Citations: 0

Pairwise Preference

We investigate whether transmission occurs through natural language paraphrases with fixed semantic content, and whether content explicitly contradicting the teacher's preference can block it.
Bioalignment: Measuring and Improving LLM Disposition Toward Biological Systems for AI Safety
Trent R Northen, Mingxun Wang · Mar 10, 2026 · Citations: 0

Pairwise Preference

A sample of 5 frontier and 5 open-weight models were measured using 50 curated Bioalignment prompts with a Kelly criterion-inspired evaluation framework.
EstLLM: Enhancing Estonian Capabilities in Multilingual LLMs via Continued Pretraining and Post-Training
Aleksei Dorkin, Taido Purason, Emil Kalbaliyev, Hele-Andra Kuulmets, Marii Ojastu · Mar 2, 2026 · Citations: 0

Pairwise Preference

We subsequently apply supervised fine-tuning, preference optimization, and chat vector merging to introduce robust instruction-following behavior.
gencat: Generative computerized adaptive testing
Wanyong Feng, Andrew Lan · Feb 23, 2026 · Citations: 0

Pairwise Preference

We train the model in a two-step process, first via Supervised Fine-Tuning and then via preference optimization for knowledge-response alignment.
LogitsCoder: Towards Efficient Chain-of-Thought Path Search via Logits Preference Decoding for Code Generation
Jizheng Chen, Weiming Zhang, Xinyi Dai, Weiwen Liu, Kounianhua Du · Feb 15, 2026 · Citations: 0

Pairwise Preference

LogitsCoder iteratively generates and refines reasoning steps by first steering token selection toward statistically preferred patterns via Logits Preference Decoding, then selecting and aggregating diverse reasoning paths using Logits Rank…
Faithful Bi-Directional Model Steering via Distribution Matching and Distributed Interchange Interventions
Yuntai Bao, Xuhong Zhang, Jintao Chen, Ge Su, Yuxiang Cai · Feb 5, 2026 · Citations: 0

Pairwise Preference

We hypothesize that this is because effective steering requires the faithful identification of internal model mechanisms, not the enforcement of external preferences.

Related Hubs

Get Started

Join the #1 Platform for AI Training Talent

Where top AI builders and expert AI Trainers connect to build the future of AI.

Self-Service

Post a Job

Post your project and get a shortlist of qualified AI Trainers and Data Labelers. Hire and manage your team in the tools you already use.

Create Account & Post a Job

Managed Service

For Large Projects

Done-for-You

We recruit, onboard, and manage a dedicated team inside your tools. End-to-end operations for large or complex projects.

Learn About Managed Service

For Freelancers

Join as an AI Trainer

Find AI training and data labeling projects across platforms, all in one place. One profile, one application process, more opportunities.

Join Now