Bilingual French-English LLM Safety Evaluator (Red Team)

Work remotely as a contractor evaluating LLM outputs in French and English, focusing on safety, policy alignment, and red-team case creation; 20+ hrs/week, $24–$36/hr. Must have hands-on LLM red-teaming experience and trust & safety or moderation background.

Generative AI & RLHF

100% Remote Hourly · $24–$36/hr

$24–$36/hr

Compensation

Worldwide

Eligibility

Intermediate

Experience

Apr 3, 2026

Posted

Open worldwide

Interested in this role?

Create a free OpenTrain account and apply in minutes.

Apply now

About OpenTrain

OpenTrain is the #1 platform for finding and building careers in AI training and data labeling. The platform connects people to projects where they can learn, grow, and directly shape how modern AI systems behave. Creating an OpenTrain account is free.

About AI training and LLM safety work

AI training (also called data labeling or human feedback work) is the human side of building AI: people create, review, and score examples that teach models how to respond. Safety-focused roles like this one help prevent toxic, illegal, or otherwise unsafe outputs by creating evaluation datasets, documenting adversarial patterns, and enforcing policy.

These roles are commonly fully remote, flexible, and accessible to people with language skills, policy or moderation experience, and an eye for detail. Contributors have direct impact on the behavior of state-of-the-art models.

The role

We’re hiring a bilingual (French/English) LLM Safety Evaluator to work as a part-time contractor (20+ hours/week). You will review AI-generated text, score and annotate outputs for safety and policy alignment, and create red-team test cases that probe model vulnerabilities across nuanced content areas.

Employment type: Contractor, Part-time.
Work arrangement: Fully remote, worldwide.
Time requirement: 20+ hours per week.

What you’ll do

You will evaluate model responses and produce safety-focused annotations and ratings in both French and English. Tasks include scoring content for policy compliance, writing clear rationales, and curating adversarial/red-team prompts and cases to expose model failure modes.

You will document patterns of adversarial behavior, help establish labeling and safety standards, and apply written safety policies consistently — explaining difficult or ambiguous decisions for reviewer and engineering teams.

Score and annotate LLM outputs for safety categories: hate, sexual content, self-harm, violence, bias, illegal goods/services, malicious activities, malicious code, and misinformation.
Design and curate red-team prompts and training cases in French and English to probe policy boundaries.
Produce clear written rationales for each annotation and log adversarial patterns for engineering and policy teams.

Requirements

You must be near-native or native in French (reading & writing) and at least C1 in English (reading & writing). The role requires proven experience in trust & safety, moderation, policy enforcement, risk operations, investigations, or safety evaluation, plus hands-on LLM red teaming experience.

Near-native or native French proficiency (reading and writing).
Minimum C1 English proficiency (reading and writing).
Bachelor’s degree or higher in Communications, Linguistics, Psychology, Law/Policy, Security Studies, or equivalent professional experience.
Proven experience in Trust & Safety, content moderation, policy enforcement, investigations, or risk operations.
Required: hands-on LLM red teaming — probing safety boundaries and documenting adversarial patterns.
Comfortable reviewing explicit, toxic, violent, sexual, or psychologically disturbing content as part of daily work.
Strong knowledge of safety categories listed above and ability to apply written policies consistently.
Practical experience using tools such as Perplexity, Gemini, ChatGPT, or similar AI systems.
Prior experience with AI data training, annotation, or evaluation workflows is preferred.

Who should apply

Apply if you combine bilingual French/English language skills with concrete trust & safety or moderation experience and documented LLM red-teaming work. This role suits someone who can write clear rationales, stay composed reviewing disturbing content, and translate real-world policy into consistent annotations.

Moderators, safety analysts, content policy specialists, security researchers, or investigators with red-team experience.
People seeking flexible, remote, part-time work where their annotations directly improve model safety.

Compensation, tools, and next steps

Hourly pay is listed at $30/hr with an approved range of $24–$36/hr, paid to contractors. You will use web-based annotation and evaluation tools () and familiar LLM interfaces like Perplexity, Gemini, or ChatGPT to generate and test prompts.

To apply, prepare examples or a short description of your red-teaming experience, moderation or policy work, and your language proficiency. OpenTrain will guide you through profile setup and project onboarding so you can begin work quickly.

Pay structure: PAY_PER_HOUR, USD, $30/hr (range $24–$36/hr).
Label types: EVALUATION_RATING, RLHF, RED_TEAMING.
Location: Remote, worldwide (contractor).