Bilingual LLM Safety Evaluator (Hebrew & English)

Join OpenTrain AI as a remote, part-time contractor reviewing and red-teaming LLM outputs in Hebrew and English to find safety failures and produce labeled evaluation data. $26–$38/hr, 20+ hours/week; your feedback will directly shape model safety.

Generative AI & RLHF

100% Remote Hourly · $26–$38/hr

$26–$38/hr

Compensation

Worldwide

Eligibility

Intermediate

Experience

Apr 3, 2026

Posted

Open worldwide

Interested in this role?

Create a free OpenTrain account and apply in minutes.

Apply now

About OpenTrain

OpenTrain is the #1 platform for finding and building careers in AI training and data labeling. We connect contributors with hands-on projects that teach and shape how modern AI systems learn, offering flexible remote work that helps you grow in this fast-moving field.

Why AI training and safety work matters

AI training (also called data labeling or annotation) is the human side of building reliable models. Safety evaluation — red-teaming, scoring outputs, and documenting failures — ensures models avoid harmful, misleading, or unsafe responses and improves their behavior for millions of users.

This role puts you on the front line of model safety: the examples, labels, and reports you create will help define policy enforcement and model behavior across future releases.

The role

You will work as a fully remote, hourly contractor reviewing AI-generated text in Hebrew and English, curating adversarial or safety-sensitive examples, rating model outputs, documenting safety failures, and stress-testing models for policy gaps. Tasks may involve explicit or sensitive material and require careful, policy-driven judgment.

This is a part-time role that requires a minimum commitment of 20+ hours per week. You will deliver labeled evaluation content used to improve LLM safety and assist with RLHF-style workflows and model safeguards.

Contract type: Contractor, part-time
Time commitment: 20+ hours/week
Data type: Text — evaluation ratings, text generation, RLHF-style work

What you'll do day-to-day

Work hands-on with LLMs to probe their safety boundaries, produce adversarial prompts, and curate examples that expose policy failures. Review and score model outputs against written safety policies and produce clear, actionable documentation of issues.

Collaborate with reviewers and reviewers’ tooling to tag content across safety categories, explain decisions in ambiguous or edge cases, and help prioritize gaps for remediation and policy updates.

Create and label adversarial prompts and safety-sensitive training examples in Hebrew and English
Score and annotate model outputs for categories like hate, sexual content, self-harm, violence, misinformation, malicious activity, and bias
Document safety failures with clear examples and reasoning to support model fixes
Perform red-teaming and stress tests to reveal policy gaps and adversarial patterns

Requirements

You must meet all required qualifications below. We will not invent or assume experience beyond what you provide.

This role involves frequent exposure to explicit, toxic, violent, sexual, or psychologically disturbing content; you must be comfortable and capable of reviewing such material professionally.

Near-native or native Hebrew proficiency (reading and writing)
Minimum C1 English proficiency (reading and writing)
Bachelor’s degree or higher in Communications, Linguistics, Psychology, Law/Policy, Security Studies, or equivalent professional experience
Proven experience in Trust & Safety, content moderation, policy enforcement, risk operations, investigations, or safety evaluation
Hands-on LLM red teaming experience, including probing safety boundaries and documenting adversarial patterns
Strong practical knowledge of safety categories (hate/harassment, sexual content, self-harm, violence, bias, illegal goods/services, malicious activities/code, misinformation)
Ability to apply written safety policies consistently and explain decisions clearly in ambiguous cases
Practical experience using tools such as Perplexity, Gemini, ChatGPT, or similar AI systems
Prior experience with AI data training, annotation, or evaluation workflows preferred

Who should apply

This role is a strong fit for intermediate-level Trust & Safety or content-moderation professionals who are bilingual in Hebrew and English and have hands-on experience with LLM behavior and red-teaming.

Apply if you enjoy analytical, detail-oriented work, can write clear rationales for difficult judgments, and want flexible remote contract work that directly influences model safety.

Compensation, schedule, and how to apply

Pay is hourly in USD with a range of $26–$38/hr (typical hourly rate indicated: $32/hr). This is a contractor, part-time role and requires at least 20 hours of weekly availability.

OpenTrain contractors work fully remotely. To apply, create or sign in to your OpenTrain account, complete your profile, and submit your application — OpenTrain lets you discover projects and apply quickly. Your application should highlight bilingual experience, Trust & Safety work, and any LLM red-teaming or evaluation projects you’ve completed.

Payment type: PAY_PER_HOUR, USD
Hourly range: $26–$38/hr (reference hourlyRate: $32/hr)
Workplace: Fully remote, worldwide