AI Workflow Engineer — LLM Integration & Prompt Engineering
Join a remote, contract role (20+ hrs/week) building LLM-powered automation pipelines: design and refine prompts, integrate LLM APIs, and perform rubric-based evaluation and QA. Pays $15–$45/hr; intermediate-level work focused on evaluation, RLHF, and function-calling.
Generative AI & RLHF
$15–$45/hr
Compensation
Worldwide
Eligibility
Intermediate
Experience
Mar 29, 2026
Posted
Open worldwide
About OpenTrain
OpenTrain is the #1 platform for finding and building careers in AI training and data labeling. The platform helps people start and grow careers teaching AI — discover projects, build a profile, and apply in minutes. Creating an OpenTrain account is free.
- Work with real-world AI training projects that shape how models behave.
- Flexible, remote work that fits around studies, other jobs, or family commitments.
About AI Training Work
AI training (also called data labeling or human feedback work) is the human side of building modern AI: people annotate, evaluate, and refine model outputs so systems perform safely and usefully. Contributors work on tasks from prompt design and response scoring to fine-tuning and RLHF.
This role sits at the intersection of engineering and annotation: you will create automation flows that rely on LLMs, then systematically evaluate and improve their outputs so they’re reliable in production.
- Hands-on work that directly influences model quality and user-facing behavior.
- Accessible remote work with opportunities to specialize in prompt engineering and LLM integrations.
Role Overview
We’re hiring an AI Workflow Engineer to develop, test, and refine LLM prompts and evaluation criteria for automation pipelines. This contract, part-time role requires ~20+ hours per week and focuses on content generation, personalized outreach flows, and communication automation.
You’ll integrate LLMs via APIs/SDKs and automation tools, design instruction-led prompts and chains, structure inputs and templates, and run rubric-based QA and evaluation to improve output quality and reliability.
- Employment type: Contractor, Part-time.
- Time requirement: 20+ hours/week.
- Compensation: Hourly pay $15–$45 (posted top rate $45/hr).
- Data type: Text; label types include evaluation rating, fine-tuning, RLHF, coding/function-calling annotations.
What You’ll Do
You will design and iterate on prompt strategies and multi-step chains, implement input templating and cleaning, and integrate LLMs into automation pipelines using REST APIs, SDKs, or tools like Zapier. You’ll also define evaluation rubrics and perform systematic reviews of model outputs.
- Create and refine prompts, instruction sets, and chained prompts for consistent structured outputs.
- Integrate LLMs using REST APIs, SDKs, webhooks, and automation tools.
- Perform rubric-based QA, evaluation ratings, and labeling to support fine-tuning and RLHF.
- Review outputs for content quality, reliability, and legal reasoning where applicable.
- Build and improve templates, input cleaning, and data-structuring processes to reduce error rates.
Requirements
You must have hands-on experience evaluating LLM outputs, strong prompt engineering skills, and prior work integrating LLMs into automation workflows. This is an intermediate-level role; we expect demonstrable experience rather than only theoretical knowledge.
- Experience evaluating LLM outputs for legal reasoning quality.
- Hands-on text annotation, evaluation, or rubric-based QA experience.
- Experience integrating LLM APIs via REST APIs, SDKs, or automation tools (e.g., Zapier).
- Strong prompt engineering background: instruction design, chaining, and structured outputs.
- Skilled in building AI-powered content automation for communication and outreach flows.
- Proficient at data structuring, templating, and input cleaning.
- Knowledge of LLM limitations and mitigation methods.
- CV must be in English, state your English proficiency level, and include email address and phone number.
Location & Eligibility
This project is available worldwide except for a set of restricted locations. Applicants located in those places cannot be acquired for this role.
- Restricted countries: Iran, Cuba, North Korea, Syria, Sudan, Venezuela, Myanmar, Russia, Belarus, Palestine.
- Restricted single-country: Switzerland.
- Restricted locations: China, Taiwan.
- Restricted single-country: Kenya.
- Restricted U.S. states: Alaska, Arkansas, California, Connecticut, Delaware, Georgia, Hawaii, Illinois, Indiana, Kansas, Louisiana, Maine, Maryland, Massachusetts, Nebraska, Nevada, New Hampshire, New Jersey, New Mexico, Ohio, Oregon, Tennessee, Utah, Vermont, Washington, West Virginia.
- Restricted territories and special regions: Antarctica; Aruba; Åland Islands; Saint Barthélemy; Bonaire, Sint Eustatius and Saba; Bouvet Island; Cocos (Keeling) Islands; Democratic Republic of the Congo; Cook Islands; Christmas Island; Western Sahara; Falkland Islands (Malvinas); French Guiana; Guad
Who Should Apply
Apply if you’re an intermediate-level practitioner who enjoys blending prompt engineering with systems thinking — building reliable automation that uses LLMs and improving outputs through structured evaluation. Ideal candidates have real integration experience and a track record of rubric-driven QA or annotation work.
- Good fit: prompt engineers, ML engineers with LLM experience, annotation leads, or automation builders with LLM API experience.
- Not required: formal ML degrees — relevant hands-on experience and strong English communication are essential.
How To Apply
Prepare a CV in English that lists your English proficiency level and includes an email and phone number. The application will be evaluated based on your experience with prompt engineering, LLM integrations, and rubric-based evaluation.
Create or use your OpenTrain account to apply and attach your CV. Expect technical screening focused on prompt design, API integration, and sample evaluation tasks.
- Include concrete examples of LLM integrations, prompt engineering work, or annotation projects when possible.
- Be prepared for a short practical test: prompt design and rubric-based output review.