Integration Developer (API Specialist) — LLM Evaluation

Join OpenTrain to train and evaluate AI systems focused on API integrations and interoperability, working remotely 20+ hrs/week as a contractor. Design prompts, assess AI-generated integration plans and payloads, and troubleshoot REST API and webhook workflows for $15–$45/hr.

Generative AI & RLHF

100% Remote Hourly · $15–$45/hr

$15–$45/hr

Compensation

Worldwide

Eligibility

Intermediate

Experience

Mar 29, 2026

Posted

Open worldwide

Interested in this role?

Create a free OpenTrain account and apply in minutes.

Apply now

About OpenTrain

OpenTrain is the #1 platform for finding and building careers in AI training and data labeling. We connect people with projects that teach AI systems how to act, reason, and integrate with real-world software by contributing the human expertise that modern models learn from.

As an OpenTrain contributor you’ll join a fast-growing industry where remote, flexible work lets you shape state-of-the-art AI behavior while building skills that translate directly into software and AI careers.

About AI Training Work

AI training (also called data labeling, annotation, or human feedback work) is the human side of building AI: people create, review, and rate examples that models learn from. Projects range from transcribing audio to writing prompts and evaluating model outputs (RLHF).

This role focuses on integrating AI with business systems — you’ll craft prompts, evaluate model-generated integration designs, and validate technical outputs so models can propose safe, reliable API-driven workflows between SaaS tools and databases.

The Role

We are looking for an Integration Developer (API Specialist) to train and evaluate AI systems in API integration and interoperability scenarios. You will generate prompts that challenge models to design integrations between business systems, evaluate AI-generated integration plans, payloads, and workflows, and ensure robust data transfer and error handling.

This is a contractor, part-time role requiring at least 20 hours per week and independent work capability. The role is intermediate level and focuses on text-based evaluation, RLHF, and fine-tuning-related tasks involving programming and function-calling outputs.

What You’ll Do

Create prompts and test scenarios that require models to design integrations between CRMs, ad platforms, email tools, and databases (Airtable, Google Sheets, Notion).
Evaluate AI-generated integration plans, payloads, schemas, and error-handling strategies against technical requirements and best practices.
Annotate and rate model outputs using rubric-based QA for functionality, correctness, and robustness (evaluation rating, RLHF, fine-tuning tasks).
Validate data mappings, schema compatibility, and tradeoffs between real-time and batch processing.
Identify, reproduce, and document API failures or data issues produced by model proposals and suggest mitigations.
Provide clear feedback and examples to improve LLM behavior on function-calling and code-like outputs.

Requirements

You must have hands-on experience building API integrations using REST APIs and webhooks and be comfortable reading and interpreting technical API documentation. English proficiency at B2 or higher is required.

Proven experience with REST APIs and webhooks integration.
Experience connecting SaaS tools and business systems and demonstrating examples of complex system integrations.
Hands-on text annotation, evaluation, or rubric-based QA experience.
Experience evaluating LLM outputs for legal reasoning quality (this specific capability is required).
Knowledge of real-time vs. batch processing tradeoffs and designing data pipelines and schema validation.
Skilled at troubleshooting API failures, debugging data issues, and documenting results.
English proficiency at B2+ level and ability to work independently.
CV must be submitted in English and indicate your English proficiency level, plus include an email address and phone number.

Who Should Apply

This role suits software engineers, integration specialists, and technical evaluators who enjoy bridging systems and teaching models to reason about integrations. Ideal candidates explain technical concepts clearly, enjoy rubric-based QA, and have real-world experience connecting SaaS platforms.

Intermediate-level contributors who want regular, part-time remote work and who have prior experience evaluating or annotating model outputs (especially around programming, function-calling, or legal reasoning) should apply.

Pay, Schedule, Locations, and How to Apply

Compensation is hourly at USD $15–$45 per hour (top rate shown as $45/hr). This is a part-time contractor role with a time expectation of 20+ hours per week. Labeling work is text-focused and uses rubric-based evaluation, fine-tuning, RLHF, and programming/function-calling assessments.

To apply, submit your CV in English that states your English proficiency level and includes email and phone contact details. Applications should highlight demonstrable projects or examples of API integrations you’ve built or evaluated.

Employment type: Contractor, Part-time; time requirement: 20+ hours/week.
Data type: Text. Label types: EVALUATION_RATING, FINE_TUNING, RLHF, COMPUTER_PROGRAMMING_CODING, FUNCTION_CALLING.
(platform/tool will be provided by the project).
Restricted locations: Applicants in the following locations are not eligible — Iran, Cuba, North Korea, Syria, Sudan, Venezuela, Myanmar, Russia, Belarus, Palestine; Switzerland; China, Taiwan; Kenya; United States states: Alaska, Arkansas, California, Connecticut, Delaware, Georgia, Hawaii, Illinoi